Studies 


Beamforming 
Tracking 
Neural  (Networks 
Mathematics 


mm^rnm 


SWentifis 


agineering 

Studies 


eofflpnea  19S1 


Center 


Studies 

in 

Beamforming 
Tracking 
Neural  Networks 
Mathematics 


R.  L.  Streit 


Accesion  For 


NTIS  CRA&I 
DTIC  TAB 
U  :a 

Justification 


” — T 


’"I 


r  r 
c  ! 


By . 

Dist  ib.itior.  / 


Availability  C'd os 


Avrii.  a-'c/or 


Special 


OTIC  QUALITY  OT8PSCT1D  8 


Scientific 

and 

Engineering 

Studies 


Statement  A  per  telecon  Roy  Streit 

Nusc/code22ioi  Compiled  1991 

Newport,  Rhode  Island  02841-5047 

NWW  9/9/92 

NAVAL  UNDERWATER  SYSTEMS  CENTER 


NEWPORT  LABORATORY,  NEWPORT,  RHODE  ISLAND 
NEW  LONDON  LABORATORY,  NEW  LONDON,  CONNECTICUT 


92-23179 

■■1111111 


#8  8  19  104 


Preface 


The  four  parts  of  this  collection  of  technical  articles,  reports 
and  memoranda  deal  with  beamforming  studies  (12  papers), 
frequency  line  detector/trackers  (9  papers),  artificial  neural 
networks  (3  papers),  and  mathematical  studies  (10  papers). 
The  content  of  these  34  papers  is  discussed  in  the  foreword 
provided  for  each  part  of  this  collection. 


Dr.  William  I.  Roderick 
Associate  Technical  Director  for  Technology 
NAVAL  UNDERWATER  SYSTEMS  CENTER 


COMPILED  1991 


-Hi- 


TABLE  OF  CONTENTS 


BEAMFORMING  STUDIES 

FOREWORD . 1 

1.  In  Situ  Optimal  Reshading  of  Arrays  with  Failed  Elements, 

M.  S.  Sherrill  and  R.  L.  Streit,  IEEE  Journal  of  Oceanic  Engineering, 

vol.  OE-12,  no.  1,  January  1987,  pp.  155-162 . 3 

2 .  In  Situ  Optimal  Reshading  of  Arrays  with  Failed  Elements: 

Algorithm  Documentation  Package,  M.  S.  Sherrill  and  R.  L.  Streit, 

NUSC  Technical  Document  8815,  Naval  Underwater  Systems  Center, 

New  London,  CT,  21  February  1991 . 13 

3.  A  General  Chebyshev  Complex  Function  Approximation  Procedure 
and  an  Application  to  Beamforming,  R.  L.  Streit  and  A.  H.  Nuttall, 

Journal  of  the  Acoustical  Society  of  America ,  vol.  72,  no.  1,  July  1982, 

pp.  181-190 . 29 

4 .  Optimization  of  Discrete  Arrays  of  Arbitrary  Geometry, 

R.  L.  Streit,  Journal  of  the  Acoustical  Society  of  America,  vol.  69,  no.  1, 

January  1981,  pp.  199-212 . 41 

5 .  The  Effect  of  Interchannel  Crosstalk  on  Array  Performance, 

R.  L.  Streit,  Journal  of  the  Acoustical  Society  of  America,  vol.  86,  no.  5, 

November  1989,  pp.  1827-1834 . 57 

6 .  A  Two-Parameter  Family  of  Weights  for  Nonrecursive  Digital 
Filters  and  Antennas,  R.  L.  Streit,  IEEE  Transactions  on  Acoustics,  Speech, 

and  Signal  Processing,  vol.  ASSP-32,  no.  1,  February  1984,  pp.  108-118 . 67 

7 .  Orthogonal  Polynomial  Based  Array  Design, 

R.  L.  Streit,  NUSC  Technical  Memorandum  851015,  Naval  Underwater 

Systems  Center,  New  London,  CT,  24  January  1985,  . 81 

8 .  A  Discussion  of  Taylor  Weighting  for  Continuous  Apertures, 

R.  L.  Streit,  NUSC  Technical  Memorandum  851004,  Naval  Underwater 

Systems  Center,  New  London,  CT,  4  January  1985 . 105 

9.  Sufficient  Conditions  for  the  Existence  of  Optimum  Beam  Patterns 
for  Unequally  Spaced  Linear  Arrays  with  an  Example,  R.  L.  Streit, 

IEEE  T ransactions  on  Antennas  and  Propagation,  vol.  AP-23,  no.  1, 

January  1975,  pp.  111-115 . 129 

10.  Optimized  Symmetric  Discrete  Line  Arrays, 

R.  L.  Streit,  IEEE  Transactions  on  Antennas  and  Propagation, 

vol.  AP-23,  no.  6,  November  1975,  pp.  860-862 . 135 

1 1 .  Real  Excitation  Coefficients  Suffice  for  Sidelobe  Control  in 

a  Linear  Array,  J.  T.  Lewis  and  R.  L.  Streit,  IEEE  Transactions  on  Antennas 
and  Propagation,  vol.  AP-30,  no.  6,  November  1982,  pp.  1262-1263 . 141 


-V- 


12.  Two  Exponential  Approximation  Methods 

R.  L.  Streit,  NUSC  Technical  Report  6357,  Naval  Underwater  Systems 

Center,  CT,  28  October  1980 . 145 


FREQUENCY  LINE  DETECTOR/TR ACKERS 


FOREWORD . 167 

13.  Frequency  Line  Tracking  Using  Hidden  Markov  Models, 

R.  L.  Streit  and  R.  F.  Barrett,  IEEE  Transactions  on  Acoustics,  Speech, 

and  Signal  Processing,  vol.  38,  no.  4 ,  April  1990,  pp.  586-598 . 169 

14.  Nonlinear  Frequency  Line  Tracking  Algorithms, 

A.  K.  Steele,  R.  L.  Streit  and  R.  F.  Barrett,  Proceedings  of  the 
Australian  Symposium  on  Signal  Processing  and  its  Applications, 

Adelaide,  Australia,  17-19  April  1989,  pp.  258-262 . 185 


15.  Frequency  Line  Tracking  Algorithms, 

R.  F.  Barrett,  A.  K.  Steele  and  R.  L.  Streit,  in  Underwater  Acoustic  Data 
Processing,  Y.  T.  Chan,  ed.,  Kluwer  Academic  ,  Dordrecht,  1985,  pp.  497-501 . 193 

16.  Frequency  Line  Tracking  Using  Hidden  Markov  Models  with 
Phase  Information,  R.  F.  Barrett  and  R.  L.  Streit,  Proceedings  of  the 
Second  International  Symposium  on  Signal  Processing  and  Its  Applications, 


Gold  Coast,  Australia,  27-31  August  1990,  pp.  243-246  . 201 

17.  Frequency  Line  Detector/Tracker  Using  Hidden  Markov  Models  with 
Amplitude  Information,  R.  L.  Streit,  NUSC  Technical  Memorandum  911143, 

Naval  Underwater  Systems  Center,  New  London,  CT,  20  June  1991 . 207 

18.  Estimation  of  Signal  Amplitude  and  Background  Noise  Power 
in  Hidden  Markov  Model  Detector/Trackers,  R.  L.  Streit,  NUSC 
Technical  Memorandum  911189,  Naval  Underwater  Systems  Center, 

New  London,  CT,  19  July  1991 . 225 

19.  Automatic  Detection  of  Frequency  Modulated  Spectral  Lines, 

R.  F.  Barrett  and  R.  L.  Streit,  Proceedings  of  the  Australian  Symposium  on  Signal 

Processing  and  Its  Applications,  Adelaide,  Australia,  17-19  April  1989, 

pp.  283-287  . 249 

20.  The  Moments  of  Matched  and  Mismatched  Hidden  Markov  Models, 

R.  L.  Streit,  IEEE  Transactions  on  Acoustics,  Speech,  and  Signal  Processing, 

vol.  38,  no.  4,  April  1990,  pp.  610-622 . 257 

21.  Connection  Machine  Implementation  of  Hidden  Markov  Models 
for  Frequency  Line  Tracking,  J.  L.  Munoz  and  R.  L.  Streit, 

in  Very  Large  Scale  Computation  in  the  21st  Century,  J.  P.  Mesirov,  ed., 

Society  for  Industrial  and  Applied  Mathematics, 

Philadelphia,  PA,  1991,  pp.  204-220 . 273 


-vi- 


ARTIFICIAL  NEURAL  NETWORK  STUDIES 


FOREWORD . 289 

22.  A  Neural  Network  for  Optimum  Neyman-Pearson  Classification, 

R.  L.  Streit,  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks, 

San  Diego,  CA,  June  17-21, 1990,  International  Neural  Network  Society, 

vol.  1,  pp.  685-690  . 291 

23.  Maximum  Likelihood  Training  of  Probabilistic  Neural  Networks, 

R.  L.  Streit  and  T.  E.  Luginbuhl,  NUSC  Technical  Memorandum  91 1277,  Naval 
Underwater  Systems  Center,  New  London,  CT,  30  December  1991 . 299 

24.  Class  Priors  for  Entropy  Maximization, 

R.  L.  Streit,  NUSC  Technical  Memorandum  911144,  Naval  Underwater 

Systems  Center,  New  London,  CT,  13  June  1991 . 355 

MATHEMATICS 

FOREWORD . 363 

25.  Saddle  Points  and  Overdetermined  Complex  Equations, 

R.  L.  Streit,  Linear  Algebra  and  Its  Applications,  vol.  64, 1985,  pp.  57-76 . 365 

26.  A  Note  on  the  Semi-Infinite  Programming  Approach  to  Complex 
Approximation,  R.  L.  Streit  and  A.  H.  Nuttall,  Mathematics  of  Computation, 

vol.  40,  no.  162,  April  1983,  pp.  599-605 . 387 


27.  Solution  of  Systems  of  Complex  Linear  Equations  in  the  /«,  Norm 
with  Constraints  on  the  Unknowns,  R.  L.  Streit,  SIAM  Journal  on 
Scientific  and  Statistical  Computation,  vol.  7,  no.  1,  January  1986,  pp.  132-149 . . .  .397 


28.  Algorithm  635:  An  Algorithm  for  the  Solution  of  Complex  Linear 
Equations  in  the  /^Norm  with  Constraints  on  the  Unknowns, 

R.  L.  Streit,  ACM  Transactions  on  Mathematical  Software,  vol.  11, 

no.  3,  September  1985,  pp.  242-249  . 417 

29.  Polynomial  Iteration  for  Nonsymmetric  Indefinite  Linear  Systems, 

H.  C.  Elman  and  R.  L.  Streit,  Proceedings  of  the  Fourth  IIMAS  Workshop, 

Guanajuato,  Mexico,  July  23-27, 1984,  in  Lecture  Notes  In  Mathematics, 

no.  1230,  J.  P.  Hennart,  ed.,  Springer-Verlag,  New  York,  pp.  103-117 . 427 

30.  Extremals  and  Zeros  in  Markov  Systems  Are  Monotone 
Functions  of  One  Endpoint,  R.  L.  Streit,  in  Theory  of  Approximation 
with  Applications,  A.  G.  Law  and  B.  N.  Sahney,  eds.,  Academic  Press, 

New  York,  1976,  pp.  387-401 . 445 

31.  Concertina-Like  Movement  in  the  Absence  of  a  Chebyshev  System, 

J.  T.  Lewis  and  R.  L.  Streit,  Journal  of  Approximation  Theory,  vol.  36,  no.  4, 
December  1982,  pp.  364-367 . 463 


- vii - 


32.  Limits  of  Chebyshev  Polynomials  when  the  Argument  is  a  Ratio 
of  Cosines,  R.  L.  Streit,  Journal  of  Approximation  Theory,  vol.  40,  no.  4, 

April  1984,  pp.  393-395  . 469 

33.  A  Routine  for  Numerical  Solution  of  Fredholm  Integral  Equations, 

R.  L.  Streit  and  A.  H.  Nuttall,  NUSC  Technical  Memorandum  TC-108-72, 

Naval  Underwater  Systems  Center.New  London,  CT,  19  May  1972 . 475 

3  4 .  Solution  of  Large  Hermitian  Eigenproblems  on  Virtual  and  Cache 
Memory  Computers,  R.  L.  Streit,  Association  for  Computing  Machinery 
SIGNUM  Newsletter  (Special  Interest  Group  on  Numerical  Mathematics) , 
vol.  16,  no.  2,  June  1981,  pp.  6-7 . 503 


BEAMFORMING  STUDIES 


Foreword 


Sidelobe  suppression  in  acoustic  arrays  is  an  important  problem  that  gives  rise  to 
challenging  mathematical  problems  that  often  cannot  be  solved  analytically.  Moreover,  the 
mathematical  problems  encountered  are  sometimes  new  and,  although  interesting  in 
themselves,  not  studied  in  the  literature.  The  novelty  and  size  of  these  problems  make  the 
development  of  numerical  algorithms  for  their  solution  very  difficult.  Papers  [1]  -  [3]*  of 
this  compilation  make  the  point  clearly  for  the  case  of  linear  arrays  with  missing  elements. 
Supporting  mathematical  background  is  found  in  papers  [26]  -  [28]  of  this  compilation. 
Sidelobe  optimality  is  stressed  in  these  papers.  A  different  definition  of  optimality  is 
presented  in  paper  [4].  The  methods  of  these  four  papers  are  applicable  to  acoustic  arrays 
of  arbitrary  geometry. 

Interchannel  crosstalk  between  acoustic  channels  can  occur  if  array  telemetry 
systems  are  imperfect.  The  potentially  debilitating  effects  of  crosstalk  on  beamforming 
performance  are  analyzed  in  paper  [5].  This  paper  presents  the  first  theoretical  study  of  the 
effect  of  crosstalk  on  beamformer  performance.  It  is  shown  that  crosstalk  can  be  corrected 
before  the  channels  enter  the  beamformer,  provided  the  crosstalk  levels  do  not  exceed  an 
upper  bound  derived  from  the  crosstalk  transfer  function. 

Families  of  weights  (shading  coefficients)  are  often  used  for  linear  arrays  for 
sidelobe  reduction.  A  two-parameter  family  of  weights  is  given  in  paper  [6]  for  discrete 
and  continuous  linear  arrays.  Of  particular  interest  in  this  paper  is  the  discrete  version  of 
the  Kaiser-Bessel  window  for  continuous  apertures.  More  general  weight  families  are 
discussed  in  [7].  Taylor  weights  are  unrelated  to  these  families  and  are  discussed  in  [8], 

A  method  for  optimal  and  suboptimal  weight  design  suitable  only  for  linear  arrays 
is  presented  in  papers  [9]  and  [10].  Mathematical  results  related  to  this  work  are  found  in 
papers  [31]  and  [32]  of  this  compilation.  Paper  [11]  shows  that  phased  weights  are 
unnecessary  for  optimal  sidelobe  suppression  in  steerable  linear  arrays.  Paper  [12] 
discusses  the  element  location  problem  for  unequally  spaced  linear  arrays. 


*  Papers  are  referred  to  in  the  order  that  they  appear  in  the  compilation. 


-1- 


In  Situ  Optimal  Reshading  Of  Arrays 
With  Failed  Elements 


M.  S.  Sherrill  and  R.  L.  Streit 


-3- 


IEFE  JOURNAL  OF  OCEANIC  ENGINEERJNG,  VOL  OE-12.  NO  I.  JANUARY  t987 


155 


In  Situ  Optimal  Reshading  of  Arrays  with  Failed 

Elements 

MICHAEL  S.  SHERRILL  and  ROY  L.  STREIT,  senior  member,  ieee 
(Invited  Paper) 


Abstract— An  algorithm  is  presented  which  computes  optimal  weights 
for  arbitrary  linear  arrays.  The  application  of  this  algorithm  to  In  situ 
optimal  reshading  of  arrays  with  failed  elements  b  discussed.  It  b  shown 
that  optimal  reshading  can  often  regain  the  original  sidelobe  level  by 
slightly  increasing  the  mainlobc  beamwidth.  Three  examples  are  pre¬ 
sented  to  illustrate  the  algorithm's  effectiveness.  Hardware  and  software 
issues  are  discussed.  Execution  time  for  a  25-element  array  b  typically 
between  I  and  2  min  on  an  HP98J6C  microcomputer. 

I.  Introduction 

A  linear  array  of  discrete  elements  (sensors)  often  experi¬ 
ences  element  failures  in  situ.  These  failures  can 
significantly  increase  the  sidelobe  levels  of  the  array 
wavenumber  response,  depending  on  how  many  elements  fail 
and  where  the  elements  are  located  within  the  array.  We 
discuss  here  an  optimal  reshading  (reweighting)  algorithm 
which  can  be  applied  in  situ  to  reduce  the  sidelobe  levels  to  the 
original  design  level.  In  many  common  element-failure  situa¬ 
tions.  optimal  reshading  can  regain  the  original  sidelobe  level 
by  slightly  increasing  the  mainlobe  beamwidth.  In  arrays 
which  experience  significant  element  failures,  optimal  reshad- 
ing  is  still  possible,  but  may  be  of  limited  use.  Three  examples 
given  below  demonstrate  a  few  of  the  possibilities. 

An  algorithm  for  optimal  reshading  was  first  proposed  in  ( 1  ] 
by  Streit  and  Nuttall.  Their  algorithm  utilized  the  general- 
purpose  subroutine  (2)  to  solve  a  specially  structured  “linear 
programming”  problem.  Unfortunately,  their  algorithm  re¬ 
quired  hours  of  computation  time  and  large  amounts  of 
computer  storage  on  a  minicomputer  (the  VAX  11/780)  to 
optimally  reshade  a  50-element  array  with  five  failed  ele¬ 
ments.  Consequently,  their  algorithm  is  not  useful  for  in  situ 
optimal  reshading. 

The  shading  algorithm  proposed  here  differs  from  Streit  and 
Nuttall’s  primarily  in  that  we  solve  their  linear  programming 
problem  using  a  new  general-purpose  subroutine  (3],  [4], 
herein  referred  to  as  Algorithm  635.  Algorithm  635  uses  the 
special  structure  of  the  linear  programming  problem  to  reduce 
time  and  storage  requirements  by  orders  of  magnitude. 
Algorithm  635  can  be  incorporated  easily  in  Streit  and 
Nuttall’s  original  approach.  A  significant  algorithmic  im¬ 
provement  was  discovered  in  the  course  of  this  study  and  is 
described  below.  The  resulting  shading  algorithm  is  fast 
enough  and  small  enough  to  execute  successfully  on  micro- 

Msnu  script  received  March  II,  1986;  revised  August  II,  1986. 

The  authors  are  with  the  Naval  Underwater  Systems  Center,  New  London, 
Cr  06320. 

IEEE  Log  Number  8714258 


computers  (such  as  the  HP9836C  used  here)  in  only  a  few 
minutes.  Typical  execution  time  for  a  25-element  array  is 
under  2  min;  for  a  50-element  array,  execution  time  is 
typically  under  10  min.  The  current  algorithm,  and  the 
HP9836C  with  its  inherent  transportability,  comprise  an 
effective  system  for  optimal  reshading  in  situ. 

II.  Optimal  Array  Shading 

The  wavenumber  response  of  a  linear  array  composed  of  N 
discrete  omnidirectional  elements  located  at  arbitrary  fixed 
positions  xn  is  given  by 

T(k )  =  £  w„  exp  t  -  ikx„]  (1) 

n  ■  I 

where  w„  are  the  element  weights  and  the  independent  variable 
k  denotes  wavenumber  in  radians  per  unit  length.  The  element 
weights  are  required  to  be  real,  but  this  entails  no  loss  of 
generality  (see  below  in  Section  III).  Also,  from  (1),  T(-k) 
=  T*(k)  for  real  weights  (asterisk  denotes  conjugation),  so  it 
is  unnecessary  to  consider  negative  values  for  k  and  we 
confine  our  attention  to  nonnegative  k. 

The  array  response  as  a  function  of  k  can  be  considered  to 
be  composed  of  a  mainlobe  beamwidth  and  a  sidelobe  region. 
The  objective  of  the  optimization  process  is  to  make  |  T(k)\  as 
small  as  possible  on  the  user-specified  sidelobe  interval.  Array 
weights  which  achieve  this  objective  are  said  to  be  optimal. 
The  optimization  process  usually  produces  equivalued  side- 
lobes  in  the  sidelobe  region. 

Weights  that  are  optimal  for  a  full  array  do  not  remain 
optimal  after  the  array  experiences  element  failures.  To 
partially  compensate  for  failed  elements,  the  array  is  optimally 
reshaded  by  undertaking  the  optimization  process  again  and 
incorporating  knowledge  of  which  elements  have  failed.  As 
the  examples  below  will  show,  the  effectiveness  of  this 
strategy  depends  upon  how  many  elements  have  failed  and  the 
location  of  these  elements  in  the  array. 

The  sidelobe  interval  is  defined  differently  depending  on  the 
interelement  spacing  of  the  array.  For  an  array  with  periodi¬ 
cally  spaced  elements  and  no  failures,  the  sidelobe  interval  is 
defined  to  be  [/C0,  (2 x/D)  -  K0],  where  K0  is  calculated  from 
the  desired  sidelobe  level  and  the  number  N  of  array 
elements. 1  D  is  the  physical  distance  from  sensor  to  sensor. 

'  For  in  Af-element  array  and  -  r-dB  peak  sidelobes,  we  have  K„  -  (2/D) 
arccoi(l/Zol  where  2Zo  «  [r  -  (rl  -  l)1'*)"*'  +  \r  +  (r>  -  l)1'1]1'**, 
r  *  10"*,  and  M  -  N  -  I .  The  interelemem  spacing  D  is  assumed  to  be 
half  of  (he  so-called  design  wavenumber,  and  N  is  the  number  of  array 
elements  before  failures. 


-5- 


156 


I  KEF-  JOURNAL  OF  OCEANIC  ENGINEERING.  VOL  OE-12.  NO  I.  JANUARY  1987 


Furthermore,  the  minimization  interval  can  be  reduced  to  l  AT0. 
*■/£>].  since  the  response  of  this  array  is  symmetric  about  k  = 
r/D.  K0  is  typically  the  point  on  the  mainlobe  response  which 
is  equal  in  magnitude  to  the  peak  of  the  sidelobes,  but  this  is 
not  always  true  for  seriously  degraded  and/or  aperiodic  arrays 
(see  Example  3  below).  For  arrays  with  aperiodically  spaced 
elements,  the  sidelobe-interval,  denoted  by  (A'o,  AT|  ],  must  be 
chosen  by  inspection  of  a  nonoptimal  beam  pattern  or  some 
other  means.  |  T(k)\  must  be  minimized  over  the  full  (/Co.  ) 

range  since,  in  general,  an  aperiodic  array  response  is  not 
symmetric  about  any  wavenumber  other  than  k  =  0.  The 
ability  to  specify  arbitrary  K0  and  K i  is  particularly  useful  for 
those  applications  involving  aperiodically  spaced  elements 
because  lower  sidelobe  levels  may  be  obtained  by  looking  at 
different  minimization  regions. 

The  optimization  process  deals  with  element  failures  in  an 
array  in  the  following  way. 

Step  1.  Maintain  mainlobe  beamwidth  and  permit  the 
sidelobe  levels  to  rise. 

Step  2.  Regain,  if  possible,  the  original  sidelobe  level  by 
broadening  the  mainlobe. 

Broadening  the  mainlobe  by  increasing  K0  (step  2)  is  per¬ 
formed  only  if  the  sidelobe  level,  even  after  optimal  reshad¬ 
ing.  has  risen  to  an  unacceptable  value  because  of  element 
failures  Thus  step  1  is  normal  algorithmic  procedure,  and  step 
2  requires  some  iteration  in  specifying  K0  and/or  K{  because  a 
compromise  has  to  be  made  between  the  mainlobe  beamwidth 
and  the  level  of  the  sidelobes. 

The  solution  of  the  array  problem  in  the  original  formula¬ 
tion  ( 1 1  is  mathematically  equivalent  to  solving  an  overdeter¬ 
mined  system  of  complex  linear  equations.  Unacceptably  high 
sidelobes  result  if  this  system  is  solved  in  the  usual  least 
squares  sense,  so  it  is  necessary  to  solve  the  system  so  that  the 
magnitude  of  the  maximum  residual  error  is  minimized.  There 
now  exists  (3|  an  efficient  algorithm  and  corresponding 
FORTRAN  code  [4]  for  solving  problems  of  this  sort  to  high 
accuracy. 

To  obtain  the  beamformer  equation  in  an  appropriate  format 
to  utilize  this  algorithm,  we  normalize  the  peak  response  of 
T(k)  so  that  T(0)  =  1.  This  gives 


level  of  the  array  response  as 


!  \  -  i  i 

min  max  !/m-  V  am„wn\  (5) 

|»„|  I  SKIS. W  |  “  | 


where  the  complex  numbers  fm  and  am„  are  defined  by 


/„,  =  exp  [ | 


a,„n  =  exp  |  -  ikmxy  ]  -  exp  [  -  ik„x„ ).  (6) 

The  problem  (5)  is  precisely  the  form  necessary  for  application 
of  Algorithm  635.  For  theoretical  details  of  this  algorithm,  the 
interested  reader  is  referred  to  (3J. 

Sometimes  a  few  of  the  optimum  weights  for  arrays  with 
failed  elements  are  observed  to  be  negative,  particularly  those 
on  the  end  elements.  If  the  weights  are  applied  in  hardware, 
providing  a  180°  phase  factor  on  the  element  output  may  not 
be  desirable  or  possible.  However.  Algorithm  635  allows  the 
selection  of  all  nonnegative  weights;  this  is  implemented  by 
the  addition  of  constraints  to  (5).  Usually,  but  not  always,  an 
element  is  zeroed  if  it  would  have  had  a  negative  weight.  From 
(2)  it  follows  that,  if  all  the  element  weight  values  are  required 
to  be  positive,  they  must  be  between  0  and  1 .  The  requirement 
that  weights  w, ,  •  •  • ,  wv  . ,  be  between  0  and  1  can  be  written 
mathematically  as 

w„-^j<^  ,  n=  1,  ••  •,  N-  1.  (7) 

Algorithm  635  requires  these  A'  -  I  constraints.  Algorithm 
635  can  also  incorporate  any  number  of  general  constraints  of 
the  form 


I 

£  m=\,  2,  ■■  ■,L  (8) 


where  c„  and  dm  are  constants.  The  requirement  that  wN  also 
be  nonnegative  gives 


1 

-I  <  - 
2 1  2 


^,=  1.  (2) 

ft  *  I 

We  solve  (2)  for  the  /Vth  weight  wN  and  substitute  in  (I)  to 
obtain 


or 


V  *1  ‘ 


(9) 


which  is  clearly  a  special  case  of  the  general  constraints  (8). 


A-  I 

T(k)  =  exp  l  -  /Arx*)  +  £  w„(exp  ( -  ikx„)~ exp  ( -  /***)] . 

n  *  I 

(3) 

By  sampling  T(k)  at  the  M  equispaced  points 

km  =  K0+[K-'-~-Kf  (m-l).  (4) 

M  —  I 

we  can  write  the  problem  of  minimizing  the  peak  sidelobe 


HI.  Algorithm  Improvements 

Several  changes  to  the  algorithm  presented  in  [1]  enable 
significant  reduction  in  the  need  for  computational  intensity. 
Lewis  and  Streit  (51  have  proved  that,  for  a  general  line  array 
shaded  so  that  it  has  optimal  sidelobe  levels  when  steered 
through  the  same  number  of  degrees  either  side  of  broadside, 
there  exists  a  set  of  optimal  weights  that  are  real.  Thus 
complex  weights  do  not  need  to  be  considered.  This  fact 
allows  an  approximate  eight-fold  reduction  in  computation 
time  and  a  two-fold  reduction  in  storage  requirements. 


SHERRILL  AND  STREIT  ARRAYS  WITH  FAILED  ELEMENTS 


157 


Ii  is  clear  that  the  50-element  example  run  in  Streit  and 
Nuttall  |l]  was  significantly  oversampled  in  wavenumber. 
Their  beam  pattern  can  be  reproduced  with  a  four-fold 
reduction  in  the  sampling  of  T(k)  (see  Example  2  below),  and 
this  in  no  way  detracts  from  the  practical  application  of  the 
algorithm.  A  significant  reduction  in  computation  time  is 
realized  by  decreasing  the  n’^uber  AT  of  beam  pattern  samples 
in  (4). 

A  significant  algorithmic  modification  made  to  Algorithm 
635  further  decreases  computation  time.  We  have  labeled  this 
modification  “fast  costing"  and  it  is  an  important  step  in 
making  the  algorithm  feasible  on  microcomputers  such  as  the 
HP9836C.  In  order  to  describe  this  modification  properly, 
some  familiarity  with  the  simplex  method  of  linear  program¬ 
ming  and  reference  (3|  is  assumed. 

Algorithm  635  can  be  broken  into  two  fundamental  compu¬ 
tational  operations  called  “costing”  and  "pivoting.”  "Cost¬ 
ing"  determines  the  so-called  minimum  reduced  cost  coeffi¬ 
cient  and  requires  2NM  multiplications,  where  N  is  the 
number  of  discrete  array  elements  and  M  is  the  number  of 
samples  taken  of  the  beam  pattern.  "Pivoting"  is  a  basis 
update  and  requires  N2  real  multiplications.  It  is  clear  that  the 
speed  of  the  algorithm  is  intimately  related  to  the  number  M  of 
samples  taken  of  the  beam  pattern,  as  well  as  the  number  N  of 
discrete  array  elements.  Since  M  is  larger  than  N.  “costing" 
requires  more  multiplications  than  "pivoting." 

"Costing"  in  the  linear  array  application  means  that,  in  each 
simplex  iteration,  the  “discretized  absolute  value”  of  every 
sidclobe  sample  of  the  wavenumber  response  function  T(km), 
m  =  1,  •  •  •,  M,  is  computed  to  determine  the  “minimum 
reduced  cost  coefficient”  of  the  current  "basic  feasible 
solution."  By  proceeding  through  a  finite  sequence  of  such 
"basic  feasible  solutions,”  we  arrive  at  the  solution  of  the 
"discretized  problem."  As  shown  in  [3],  this  implies  that  the 
computed  optimal  wavenumber  response  function  can  have 
sidelobe  levels  that  are  theoretically  at  most  0.04  dB  higher 
than  the  true  optimum  sidelobe  level.2  "Fast  costing”  refers 
simply  to  the  fact  that  we  first  determine  which  of  the  sidelobe 
samples  T(km),  m  =  1 ,  ■  •• ,  M,  has  the  largest  true  absolute 
value,  and  then  compute  the  “discretized  absolute  value"  of 
this  one  complex  number.  Therefore,  only  one  "discretized 
absolute  value"  calculation  is  performed  in  each  simplex 
iteration  instead  of  M  such  calculations.  The  resulting 
reduction  in  computational  effort  is  significant  in  microcom¬ 
puting  environments.  The  drawback  is  that  the  use  of  "fast 
costing”  prevents  the  simplex  algorithm  from  converging  to  a 
solution  of  the  "discretized  problem.”  Fortunately,  however, 
it  can  be  proved  that  we  must  approximate  the  solution  in  a 
well-defined  sense.  In  the  linear  array  application,  "fast 
costing”  results  in  the  computed  optimum  beam  pattern 
having  sidelobe  levels  that  are  theoretically  at  most  0.08  dB 
higher  than  the  true  optimum  level.3  This  is  a  small  price  to 
pay  for  major  execution  time  improvements. 

1  The  theoretical  error  of  at  most  0.04  dB  is  derived  by  taking  20  logio  (sec 
( t/p)),  where  p  =  32.  The  term  sec  ( «■//>)  is  the  error  bound  discussed  in 
(3| 

’  Fast  costing  squares  the  error  bound,  giving  sec1  ( r/p),  or  0  08  dB  when 
p  =  32. 


IV.  Algorithm  Implementation  for  In  Situ  Use 

An  algorithm  must  be  reliable,  easy  to  use,  and  fast  when 
executing  on  portable  microcomputers,  to  be  useful  for  in  situ 
application.  The  following  section  details  the  most  important 
hardware  and  software  issues  addressed  to  enable  in  situ 
optimal  reshading  of  arrays  with  failed  elements. 

The  algorithm  has  been  coded  in  BASIC  and  is  comprised  of 
Algorithm  635  and  an  array  processing  driver  program. 
Algorithm  635  solves  the  linear  program  for  a  set  of  optimal 
weights,  given  data  supplied  by  the  driver  program.  The 
driver  performs  the  initial  setup  based  on  several  user  inputs 
and  provides  all  program  output. 

The  driver  program  may  be  used  with  linear  arrays  having 
either  periodic  or  aperiodically  spaced  elements.  Program 
output  consists  of  a  graph  of  the  optimal  beam  pattern,  a  graph 
of  the  optimal  normalized  element  weights,  and  several 
parameters  pertinent  to  the  specific  problem.  Provision  is 
made  for  storing  the  weights  in  a  separate  data  file  for  possible 
use  with  digital  beamformers. 

A  Hewlett-Packard  (HP)  specific  software  modification  was 
made  by  setting  up  the  input  data  arrays  (equation  (6))  in 
buffers  so  that  they  are  accessible  for  a  one-dimensional 
multiply.  For  large-array  dimensions,  indexing  a  doubly 
subscripted  data  array  and  performing  a  dot  product  takes 
more  time  on  the  HP9836C  than  reading  in  a  data  array  from  a 
buffer,  doing  a  MAT  multiply,  and  performing  a  summation. 
(A  MAT  multiply  is  simply  an  elcment-by-element  multiply  of 
two  equally  dimensioned  data  arrays.)  However,  this  proce¬ 
dure  is  more  time  consuming  when  the  input  data  arrays  are 
very  small  (i.e.,  the  number  of  elements  in  the  line  array  is 
small).  The  break-even  point  occurs  at  around  12  or  13 
elements,  so  it  was  decided  to  incorporate  this  speed  enhance¬ 
ment  for  the  longer-running  larger  line  arrays  and  trade  off 
some  speed  reduction  on  the  smaller  line  arrays. 

To  obtain  fast  execution  times  for  in  situ  applications,  we 
use  one  hardware  speed  enhancement,  a  12.5-MHz  fast  CPU 
card  with  16  kbytes  of  cache  memory.  This  hardware 
supplement  is  available  from  HP  for  use  on  the  HP9836C. 
Cache  memory  is  fast  memory  resident  on  the  CPU  card  for 
quick  instruction  acquisition.  The  use  of  the  fast  CPU  board 
rather  than  the  8-MHz  clock  present  in  the  standard  computer 
configuration  results  in  an  approximate  factor-of-two  increase 
in  observed  speed. 

The  complete  program  is  precompiled  by  use  of  software 
and  a  floating  point  math  card  available  from  the  INFOTEK 
company.  Precompilation  reduces  most  computational  por¬ 
tions  of  the  BASIC  code  to  machine  language,  giving  an 
additional  three-fold  reduction  in  computation  time.  It  is  also 
desirable  to  upgrade  the  operating  system  for  the  HP  to  its 
latest  revision.  All  work  on  these  problems  was  run  using  the 
BASIC  3.0  operating  system  and  the  hardware  supplements 
noted  above. 

Computation  lime  is  defined  as  time  spent  in  Algorithm  635 
and  does  not  include  the  small  amount  of  set-up  time  required 
by  the  driver  program.  Computation  times  are  for  the 
compiled  BASIC  program  run  on  the  HP9836C  with  the 
special  hardware  additions  mentioned  above. 

The  program  described  here  needs  just  over  303  kbytes  of 


-7- 


158 


If  f  1  JOI  KNAl  Of  UCFANIC  IMifS'H  RING.  VOL  OF  1L  NO  I,  JANUARY  IRK? 


internal  memory  in  addition  to  the  memory  required  by  the 
operating  system  to  execute  on  the  HP9836C.  This  is  the 
amount  of  space  required  by  fixing  the  maximum  array  size  at 
N  =  50,  and  allowing  at  most  M  =  256  beam  pattern 
samples.  Users  can  change  dimensions  to  suit  their  specific 
needs,  but  storage  requirements  presently  are  directly  propor¬ 
tional  to  the  product  NM.  Even  for  a  much  larger  number  of 
line  array  elements,  it  is  unlikely  that  memory  restrictions 
would  prove  to  be  a  problem  on  the  HP9836C  since  extra 
memory  boards  of  1  Mbyte  each  are  readily  available. 

Ongoing  modifications  should  further  enhance  the  capabil¬ 
ity  and  speed  of  the  BASIC  algorithm  and  driver.  The  addition 
of  the  ability  to  handle  directional  sensors  is  both  useful  and 
straightforward  to  implement.  Execution  of  identical  code  on 
the  new  HP  300  series  computers,  which  have  a  16.6-MHz 
clock  rate,  should  further  reduce  the  computation  time. 
Computation  times  on  the  order  of  5  min  for  a  50-element 
array  and  I  min  for  a  25-element  array  are  anticipated. 

It  is  possible  to  run  the  BASIC  program  in  its  uncompiled 
state.  The  execution  of  the  program  with  cache  memory  and 
the  fast  CPU  board  as  the  only  enhancements  results  in 
computation  times  of  approximately  25  min  for  a  50-cleinent 
array  and  4.5  min  for  a  25-element  array. 

A  copy  of  the  entire  program  is  available  from  the  authors. 
Our  specific  implementation  in  HP  BASIC  utilizes  several 
hardware  and  software  devices  to  achieve  computational 
efficiency,  some  of  which  may  not  be  pertinent  to  other 
BASIC  operating  systems  running  on  comparable  machines. 
Users  will  undoubtedly  find  it  necessary  to  make  modifications 
to  the  code  to  allow  it  to  run  on  other  HP  equipment  or  in 
BASIC  on  the  VAX. 

V.  Examplfs 

The  following  examples  demonstrate  the  utility  of  the 
current  algorithm  for  application  in  situ  and  provide  insight 
into  different  situations  that  might  arise  when  reshading 
equispaced  arrays  with  failed  elements.  If  optimal  reshading 
can  restore  the  array's  original  design  sidclobe  level  by 
slightly  increasing  the  mainlobe  beamwidth.  then  we  say  that 
the  optimal  reshading  has  been  effective.  Optimal  reshading  is 
effective  in  many  common  element-failure  situations.  When 
the  array  is  severely  degraded,  optimal  reshading  is  less 
effective  but  is  still  useful  in  reducing  the  negative  impact  of 
element  failures.  These  examples  demonstrate  that  the  effec¬ 
tiveness  of  reshading  depends  upon  the  number  of  element 
failures,  as  well  as  the  location  of  the  failed  elements  within 
the  array. 

Missing  elements  are  modeled  by  zeroing  the  appropriate 
weights.  In  these  examples,  jV  refers  to  the  number  of  intact 
array  elements,  M  is  the  number  of  beam  pattern  samples,  and 
K0  is  calculated  by  using  the  equation  in  an  earlier  footnote 
We  define  the  mainlobe  width  to  be  twice  K0  in  all  three 
examples. 

A.  Example  l:  Effective  Reshading 

This  example  demonstrates  that  reshading  can  restore  the 
original  sidelobe  level  of  an  array  response  by  slightly 
increasing  the  mainlobe  beamwidth.  In  a  25-element  equi- 


spaccd  array,  originally  designed  for  -  30-dB  sidelobes, 
elements  2  and  4  have  failed.  Therefore,  A'  =  23,  M  -  128, 
and  A, i  =  0.6877.  We  first  keep  the  mainlobe  width  fixed  and 
allow  the  sidelobe  level  to  rise.  See  Fig.  I .  The  peak  sidelobe 
level  has  risen  to  -  26.86  dB  below  the  mainlobe,  and  the 
mainlobe  width  is  unchanged.  If  the  sidelobe  level  after 
reshading  is  too  high,  an  alternative  to  discarding  or  repairing 
the  array  is  to  broaden  the  mainlobe  beamwidth.  In  Fig.  2,  Af0 
is  increased  to  0.775  and  the  peak  sidelobe  level  diminishes  to 
-  30.04  dB  below  the  mainlobe.  A  trade-off  must  always  be 
made  between  an  enlarged  mainlobe  beamwidth  and  an 
acceptable  peak  sidelobe  level.  In  this  case  the  mainlobe  was 
increased  12.7  percent  in  order  to  recover  the  original  sidelobe 
level.  Execution  times  on  the  HP9836C  are  between  1  and  2 
min  for  Fig.,.  '  and  2. 

B,  Example  2:  Moderately  Effective  Reshading 

This  example  is  taken  from  Streit  and  Nuttall  |1].  Because 
of  the  improvements  detailed  in  Section  III,  above,  the  current 
algorithm  runs  faster  on  the  HP9836C  than  on  the  VAX  1 1/ 
780.  although  the  floating  point  multiply  time  on  the  HP  in  its 
basic  configuration  is  roughly  200  times  slower  than  on  the 
VAX. 

Consider  a  linear  array  with  50  equispaced  elements, 
initially  designed  for  peak  sidelobes  of  -  30  dB  relative  to  the 
mainlobe.  Fig.  3  shows  the  classical  Dolph-Chebyshev  beam 
pattern  with  -  30-dB  sidelobes  throughout  the  minimization 
range  |A'0,  (2 ir/D)  -  A0|  This  was  computed  using  the 
current  algorithm  in  6.11  min.  (This  idea)  case  could  have 
been  computed  analytically.) 

Now  we  suppose  that  five  elements.  7.  22.  40.  43.  50,  of  the 
array  have  failed.  The  optimal  response  after  reshading  the 
array  is  shown  in  Fig.  4.  The  peak  sidelobe  level  has  risen  to 
-25.51  dB.  but  we  have  maintained  mainlobe  beamwidth  and 
retained  full  steering  capability.  In  this  example  N  ~  45  and 
M  =  128. 

This  example  (Ftg.  4)  took  7.47  minutes  on  the  HP9836C 
and  required  292  simplex  iterations.  The  algorithm  of  Streit 
and  Nuttall  required  38.4  min  and  402  iterations  on  the  VAX. 

Recovery  of  the  original  sidelobe  level  is  possible  (Fig.  5). 
The  mainlobe  beamwidth  must  be  increased  by  the  large  factor 
257.6  percent  (A'<>  =  0.871)  and  the  execution  of  this  task 
takes  8.98  min  and  requires  351  iterations.  The  constraint  that 
all  the  weights  lie  between  0  and  I  is  used.  It  is  necessary  to 
use  the  constraint  in  this  instance  because  otherwise  a 
dislocation  of  the  maximum  response  from  k  =  0  results.  This 
dislocation  is  due  to  the  presence  of  too  many  negatively 
weighted  elements. 

C.  Example  3:  A  Severely  Degraded  Array 

This  example  shows  that,  for  severely  degraded  arrays, 
recovery  of  the  original  sidelobe  level  may  not  be  possible  by 
increasing  the  mainlobe  beamwidth.  even  after  optimal  re- 
shading.  Consequently,  control  of  the  level  of  the  first  sidelobe 
must  be  relinquished  in  order  to  gain  control  of  the  level  of  the 
remaining  sidelobes. 

Consider  a  25-element  array  with  elements  1 1  and  14  failed. 


-8- 


SHERRILL  AND  STREIT:  ARRAYS  WITH  FAILED  ELEMENTS 


159 


Fig.  I .  Optimized  array  response  jnd  normalized  weights  for  25  elements 
with  elements  2  and  4  missing. 


ri2  r  Jtr/J  2»  S  wit  iw  till  ‘i 


K0  a  775  k  (rad/iti )  K,  a  11.7(14 


Fig.  2.  Array  response  and  normalized  weights  for  Example  I  with  Af0  a 
0.775. 


The  original  sidelobe  level  is  -  30  dB.  Here  N  =  23,  M  = 
128,  and  K0  =  0.6877.  Fig.  6  shows  the  algorithm's  optimal 
response  to  this  configuration.  It  is  a  significant  observation 
that,  in  this  case,  small  perturbations  of  K0  will  not  affect  the 
level  of  the  sidelobes.  Only  when  the  first  sidelobe  is 
incorporated  into  the  mainlobe  beamwidth  (K0  =  1.27)  does 
the  level  of  the  remaining  sidelobes  return  to  the  original 
desired  value  (see  Fig.  7).  It  is  apparent  that  decreasing  the 


minimization  interval  by  moving  K0  far  enough  to  the  right 
will  improve  the  approximation,  but  one  must  give  up  control 
of  the  first  sidelobe  to  reduce  the  others  to  acceptable  levels. 
The  net  effect  of  losing  two  elements  so  close  to  the  center  is 
that  negligible  emphasis  is  placed  on  the  remaining  center 
elements  (12  and  13)  and  the'  rest  of  the  aperture  is  reshaded  as 
if  it  were  two  separate  arrays. 

This  situation  cannot  be  overcome  by  using  different 


-9- 


160 


IEFE  JOURNAL  OF  OCEANIC  ENGINEERING.  VOL  OE-12.  NO  I.  JANUARY  1987 


Fig  3.  Classical  Dolph-Chebyshcv  array  response  and  normalized  weights 
for  /V  =  SO  and  -  30dB  sidelobes 


Fig.  4.  Optimized  array  response  and  normalized  weights  for  50  elements 
with  elements  7,  22,  40.  43,  and  50  failed. 


weights.  The  optimal  property  of  the  array  problem  formula¬ 
tion  and  solution  tells  us  that  no  weights  exist  which  can 
suppress  all  the  sidelobes  below  a  certain  level.  Thus  this  array 
has  lost  too  many  elements  and  performance  cannot  be 
restored  to  its  original  design  levels  merely  by  reshading. 

We  have  chosen  to  relinquish  control  of  the  first  sidelobe  to 


gain  control  of  the  level  of  the  remaining  sidelobes.  We  pick 
the  first  sidelobe  merely  for  ease  of  implementation;  modifica¬ 
tion  of  the  algorithm  to  forfeit  control  of  a  different  sidelobe 
could  also  have  been  done.  The  need  to  relinquish  control  of 
the  first  sidelobe  level  has  only  appeared  in  cases  of  severe 
array  degradation  due  to  element  losses. 


-10- 


SHERRILL  AND  STREIT:  ARRAYS  WITH  FAILED  ELEMENTS 


161 


q 

*72  *  3*(2  2*  S*#2  3*  7t/2 


3*/2  2* 

k  (rtd/m ) 


7*/2  4* 

K,  •  li  t 


Fig  5  Recovery  of  original  sidelobe  level.  Example  2  with  =  0.871. 


Fig.  6  Optimal  array  response  and  normalized  weights  for  25  elements  with 
elements  11  and  14  failed. 


VI.  Conclusions 

Arrays  that  have  failed  elements  can  be  reshaded  to  obtain 
optimal  array  response  functions.  Optimal  reshading  is  effec¬ 
tive  in  many  common  element-failure  situations.  When  the 
array  is  severely  degraded,  reshading  is  less  effective,  but  still 
can  be  used  to  reduce  the  negative  impact  of  element  failures. 


Optimal  reshading  can  be  accomplished  in  situ,  quickly  and 
reliably,  on  portable  microcomputers  using  the  algorithm 
described  here.  Arrays  with  25  elements  routinely  run  in  less 
than  2  min  and  computation  time  for  a  50-element  array  is  less 
than  10  min.  The  algorithm  can  be  applied  to  arrays  of  evenly 
or  unevenly  spaced  linear  geometry. 

The  above  examples  (and  others)  support  the  generally 


-11- 


162 


IEEE  JOURNAL  OF  OCEANIC  ENGINEERING.  VOL  OE-12.  NO  I.  JANUARY 


accepted  notion  that  failure  of  near-center  elements  is  more 
detrimental  to  the  array  response  than  failure  of  near-edge 
elements. 

Another  application  of  Algorithm  635  is  to  arrays  of  planar 
and  arbitrary  three-dimensional  geometry.  Computation  times 
for  these  more  general  arrays  probably  will  depend  upon  N 
(number  of  sensors)  and  M  ( number  of  beam  pattern  samples) 
in  the  same  manner  as  for  linear  arrays. 

References 

(1]  R.  L.  Streit  and  A.  H.  Nuttall,  “A  general  Chebyshev  complex 
function  approximation  procedure  and  an  application  to  beamforming, ' ' 
J.  Acouil.  Soc.  Amer.,  vol.  72,  pp.  181-189,  1982. 

(2]  I.  Barrodale  and  C.  Philips,  “Solution  of  overdetermined  systems  of 
linear  equations  in  the  Chebyshev  norm,"  Algorithm  495,  ACM 
Trans.  Math.  Software,  vol.  1,  pp.  264-270,  1975. 

(3]  R.  L.  Streit,  "Solution  of  systems  of  complex  linear  equations  in  the  L. 
norm  with  constraints  on  the  unknowns,"  Soc.  Indus.  Appl.  Math., 
vol.  7,  no.  1.  pp.  132-149,  1986. 

14]  R.  L.  Streit,  “An  algorithm  for  the  solution  of  systems  of  complex 
linear  equations  in  the  L.  norm  with  constraints  on  the  unknowns,” 
Algorithm  635,  ACM  Trans.  Math.  Software,  vol.  11,  no.  3,  pp. 
242-249,  1985. 

|5]  J.  T.  Lewis  and  R.  L.  Streit,  "Real  excitation  coefficients  suffice  for 
sidelobe  control  in  a  linear  array,"  IEEE  Trans.  Antennas  Propagal., 
vol.  AP-30.  pp.  1262-1263,  1982;  alto  Naval  Underwater  Systems 
Center,  New  London.  CT,  Tech.  Memo.  811114,  Aug.  17,  1981. 


Michael  S.  Sherrill  was  bom  on  March  8.  1961.  in 
Dover.  DE.  He  received  the  B.S  degree  in  electri¬ 
cal  engineering  from  the  University  of  Delaware. 
Newark,  in  1983. 

He  joined  the  staff  of  the  Naval  Underwater 
Systems  Center,  New  London.  CT.  upon  gradua¬ 
tion.  He  is  interested  in  mathematical  modeling  of 
physical  systems  and  structured  programming.  Cur¬ 
rently,  he  is  responsible  for  a  data  acquisition  and 
processing  system  for  towed  arrays. 


★ 


Roy  L.  Streit  (SM’84)  was  bom  in  Guthrie.  OK.  on 
October  14.  1947.  He  received  the  B.A.  degree 
(with  honors)  in  mathematics  and  physics  from  East 
Texas  Slate  University,  Commerce,  in  1968,  the 
M.A.  degree  in  mathematics  from  the  University  of 
Missouri,  Columbia,  in  1970,  and  the  Ph.D.  degree 
in  mathematics  from  the  University  of  Rhode 
Island,  Kingston,  in  1978. 

He  is  currently  an  Adjunct  Associate  Professor  of 
the  Department  of  Mathematics.  University  of 
Rhode  Island.  He  was  a  Visiting  Scholar  in  the 
Department  of  Operations  Research,  Stanford  University.  Stanford.  CA, 
during  1981-1982.  He  joined  the  staff  of  the  Naval  Underwater  Systems 
Center,  New  London,  CT  (then  the  Underwater  Sound  Laboratory),  in  1970. 
He  is  an  Applied  Mathematician  and  has  published  work  in  several  areas, 
including  antenna  design,  complex  (unction  approximation  theory  and 
methods,  and  semi-infinite  programming.  He  is  currently  conducting  research 
in  lowed  array  design  and  hidden  Markov  models. 


-12- 


In  Situ  Optimal  Reshading  Of  Arrays 
With  Failed  Elements: 
Algorithm  Documentation  Package 

M.  S.  Sherrill  and  R.  L.  Streit 


- 13 - 


Abstract 


This  algorithm  documentation  package  contains  three 
appendices:  Appendix  A*  is  a  reprint  of  an  article  in  the  IEEE 
Journal  of  Oceanic  Engineering,  volume  OE-12,  number  1,  January 
1987,  In  Situ  Optimal  Reshadtng  of  Arrays  -with  Failed  Elements; 
Appendix  B  is  a  listing  for  the  driver  routine  in  program  "Reshade," 
and  Appendix  C,**  a  3-1/2-inch  floppy  disk  containing  the  program 
"Reshade"  that  runs  on  any  HP  Series  200/300  microcomputer. 


*  Appendix  A  is  the  lead  document  of  this  compilation. 
**  Appendix  C  is  not  included  here. 


-15- 


In  Situ  Optimal  Reshading  of  Arrays  with  Failed  Elements 


ALGORITHM  DOCUMENTATION  PACKAGE 

This  document  assembles  under  one  cover  information  on  a  NUSC-developed  algorithm  which 
computes  optimal  shading  weights  for  discrete  elements  (sensors)  in  linear  acoustic  arrays.  The 
algorithm  has  been  found  especially  useful  when  elements  fail  and  array  reshading  is  required  in 
situ.  The  main  attractions  of  the  algorithm  are  that  it  loads  easily  on  Hewlett-Packard  microcom¬ 
puters,  and  that  it  runs  fast  enough  and  is  accurate  enough  to  suit  most  sea  trial  and  engineering 
development  applications.  Continuing  requests  for  this  information  since  an  invited  paper  first 
appeared  in  the  IEEE  Journal  of  Oceanic  Engineering  in  January  1987  motivated  the  publication  of 
this  documentation  package. 

The  information  included  here  is  in  hard  copy  and  floppy  disk  form:  the  IEEE  paper;  In  Situ  Optimal 
Reshading  of  Arrays  with  Failed  Elements,  is  reprinted  in  Appendix  A;  a  program  listing  of  the 
application-specific  driver  routine  is  given  in  Appendix  B;  and  a  3Vi  inch  floppy  disk,  containing  the 
program  “Reshade”  which  runs  on  any  Hewlett-Packard  Series  200  or  300  microcomputer,  is  pocketed 
in  Appendix  C. 


GENERAL  APPLICATION 

When  a  linear  array  of  discrete  acoustic  elements  is  subjected  to  the  rigors  of  the  ocean  environment, 
individual  elements  can  fail.  Element  failures  are  usually  characterized  by  noisy  channels,  or 
intermittent  responses,  or  no  response  at  alL  Depending  on  the  number  of  failed  elements  and  their 
specific  locations  within  the  array,  sidelobe  levels  of  the  array  wavenumber  (Jfc)  response  can  rise 
significantly  to  degrade  array  performance.  Because  element  weighting  values  determine  array 
wavenumber  response,  weights  that  are  optimal  for  a  fully  populated  array  have  to  be  recalculated 
when  elements  fail.  The  optimal  reshading  (reweighting)  algorithm  described  here  can  be  applied 
in  situ  to  compute  weighting  values  that  can  reduce  sidelobe  levels  to  approximately  the  original 
design  specification.  In  fact,  in  the  more  common  situations  where  “a  few”  elements  fail,  optimal 
reshading  does  regain  original  sidelobe  levels.  Where  large  numbers  of  elements  fail,  optimal 
reshading  is  still  possible  but  may  be  of  limited  use. 

The  original  approach  for  optimal  reshading  of  a  linear  array  was  proposed  by  Streit  and  Nuttall  in 
1982  (see  reference  1,  Appendix  A).  At  that  time,  the  algorithm  was  run  on  a  VAX  11/780  and 
required  hours  of  computation  time  and  large  amounts  of  mass  storage  for  rudimentary  element 
failure  problems. 

The  1987  reshading  algorithm  incorporates  several  algorithmic  improvements  that  exploit  the  special 
structure  of  the  underlying  linear  programming  problem  to  reduce  time  and  storage  requirements 
by  orders  of  magnitude.  The  current  algorithm  is  still  based  on  the  original  theory  but  is  now  fast 
enough  and  small  enough  to  execute  successfully  in  minutes  instead  of  hours  in  the  application 
environment  Execution  time  for  a  50-element  array  is  typically  about  10  minutes.  Derivation  of  the 
optimal  reshading  algorithm  and  its  implementation  are  given  in  the  references  of  the  paper  reprinted 
in  Appendix  A;  examples  of  array  reshading  are  given  in  the  paper  itself. 


1 


-17- 


TO  RON  "RESHADE"... 

•  Insert  the  program  disk  from  Appendix  B  into  device/drive. 

•  Type  the  command  string  LOAD  *VESaADE:(device  specifier)" 

•  Press  Enter, 

•  When  the  program  is  loaded,  press  Run. 

•  Follow  the  prompts. 


PROGRAM  NOTES 

“Reshade”  comprises  a  driver  routine  in  HP  BASIC  which  sets  up  the  necessary  variables  to  be 
optimized  and  a  generic  optimization  routine.  The  driver  is  listed  in  lines  1  through  481  of  the 
program — the  printout  is  contained  in  Appendix  B. 

The  driver  included  in  “Reshade”  applies  to  a  linear  array  of  acoustic  elements,  some  of  which  may 
have  failed  during  the  course  of  a  sea  trial  or  similar  event.  Even  with  the  array  intact,  “Reshade” 
allows  the  user  to  minimize  the  sidelobe  levels  of  the  array  beamformer  output,  given  a  certain 
mainlobe  width.  If  the  minimum  sidelobe  levels  remain  too  high,  it  is  possible  to  alter  the  mainlobe 
beamwidth  to  reduce  sidelobe  levels.  Note  that  the  weights  on  each  element  can  be  set  up  as 
non-negative,  if  desired. 

The  program  prompts  require  user  inputs,  not  all  of  which  are  self-explanatory  For  each  user  input, 
values  in  (parentheses)  are  those  allowed,  and  values  in  [brackets]  are  the  defaults.  The  maximum 
allowable  total  number  of  array  elements  is  50;  the  minimum  is  three.  The  computation  time  for  a 
50-element  array  is  approximately  10  minutes,  while  a  1 0-element  array  runs  in  less  than  one  minute. 

The  algorithm  is  applicable  to  both  equispaced  or  aperiodically  spaced  linear  arrays.  In  the 
equispaced  arrangement,  the  wavenumbers  kO  and  kl — which  delimit  the  region  in  which  the 
minimization  is  performed — are  calculated  automatically  from  the  desired  sidelobe  level.  The  final 
sidelobe  level  depends  upon  the  number  of  failed  elements  in  the  array  and  their  location.  In  this 
case,  only  the  inter-element  spacing  must  be  specified 

In  the  aperiodically  spaced  arrangement,  every  element’s  position  referenced  to  the  forward  end  of 
the  array  must  be  specified.  If  elements  have  failed  (or  are  missing),  they  are  treated  as  if  they  do 
not  exist  The  wavenumbers  kO  and  kl  are  not  calculated  automatically  for  an  aperiodic  array,  and 
must  be  entered  manually  in  units  of  radians/metet 


2 


-18- 


If  unsatisfactoiy  sidelobe  levels  are  still  present  after  running  “Reshade”  for  an  equispaced  arrange¬ 
ment,  kO  can  be  increased  to  provide  a  larger  beamwidth,  thus  reducing  sidelobe  levels.  For  an 
aperiodic  anay,  kO  and  kl  can  be  altered  manually  to  reduce  the  sidelobe  level  in  the  region  of 
interest. 

Resultant  weights  can  be  stored  in  a  data  file  in  the  following  format: 

•  Equispaced  element  arrangement— total  number  of  elements,  followed  by  the  inter-element 
spacing,  followed  by  array  weights. 

•  Aperiodic  element  arrangement — total  number  of  elements,  followed  by  each  element’s 
position,  followed  by  array  weights. 


PROGRAM  EXTENSIONS  AND  IMPROVEMENTS 

“Reshade”  and  its  associated  algorithm  have  established  the  validity  of  in  situ  computation  of  linear 
acoustic  array  optimal  shading  weights.  Virtually  no  sea  trial  is  conducted  today  without  reshading 
to  compensate  for  failed  elements.  Extensions  to  larger  linear  array  problems  are  potentially  useful 
Improvements  and  modifications  to  the  original  source  code  are  possible  in  the  light  of  recent 
advances  in  signal  processing  hardware,  and  are  needed  to  obtain  reasonable  computation  times  for 
these  larger  arrays.  With  the  advent  of  single-board  array  processors,  the  beam  pattern  computations 
done  (implicitly)  in  each  iteration  in  the  generic  optimization  model  (KAPROX)  may  be  performed 
more  quickly  and  accurately  using  a  floating  point  FFT  This  is  but  one  example  of  software 
modifications  which  will  enhance  the  performance  of  “Reshade.” 

The  generic  nature  of  the  optimization  routine  lends  itself  to  the  solution  of  more  general  array 
problems.  These  arrays  may  be  multiline,  planar;  or  three-dimensional  with  arbitrary  geometry  Each 
geometry,  however;  will  require  a  specific  driver  routine  to  set  up  the  problem  to  be  optimized.  In 
general  the  drivers  would  need  the  capability  to  address  complex  weights,  allocate  enough  memory 
for  computations,  and  to  take  into  account  any  application-specific  constraints  imposed  on  the 
optimization  problem.  Additional  constraints  can  be  useful;  for  instance,  constraints  can  sometimes 
be  used  in  active  arrays  to  control  adverse  effects  of  acoustic  coupling  between  the  array  elements. 


-19- 


Appendix  B 


DRIVER  PROGRAM  LISTING:  "Reshade” 
lines  1  through  481 


-20- 


!  CLEAR  SCREEN 


1  OPTION  BASE  1 

2  OUTPUT  2  USINC  "»,B"$255,75 

3  PRINTER  IS  CRT 

4  RAD 

5  REAL  Estore<50),U<256),Tbeam<256),Grres<256),Hsens<256> 

6  INTEGER  loexi t <10) , I t 1 og<10) , leount <50 > , I jsut <3, 51 > 

7  INTEGER  Ldi *, I , J, K, N, C5, Symfl ag, Cachf 1 ag, F 1 oat fl ag, H 

8  COM  /Arrss/  Zradii <50> , Bradi i <4) , Cheb< 10) , Z < 50 > , Zcent r<50) 

9  COM  /Arrssl/  Ref <256) , I »f <256) , Reb<4 , 50)  BUFFER, Imb<4, 50)  BUFFER, Rebcentr< 
4), Inbcentr<4) 

10  COM  /Pro j-'  Basinv<51,54),Cossin<2, 1 025) , Rea<256 , 50)  BUFFER, Ima<256, 50)  BUF 
FER, Cos45, Space 

11  COM  /Param/  INTEGER  Ndi m, M, L, Logp, Ndi mpl , Nd i mp4, SI1 , Tme, Mi se 1<10> 

12  COM  /Buffmult/  Colrea<50),Coliaa<50)lColreb<50),Colinb<50) 

13  COM  /Groups/  INTEGER  Nogroup, REAL  Sensl en, Xgroup<25) , Dgroup, D<50) 

14  COM  /Groupsl/  Hydsen*C31 , Hydro*C3I 

15  DIM  Sim33,Equi»C3J,Meight*C3J,Neguet*C33,Neuko*C33,Data_msus*C  10J.F1  lena 

meSC 103 , MgtstoreSC33 , Group  space*C33 

16  ! 

17  Data_«sus*«“s INTERNAL" 

18  Cachflag«0 

19  ON  ERROR  GOTO  24  !  POSSIBLE  ERRORS  IF  INTERFACE  NOT  PRESENT 

20  CONTROL  32, 1 ; 1  !  IF  CACHE  MEMORY  IS  PRESENT  IT  MILL  BE  UTILIZED 

21  OFF  ERROR 

22  STATUS  32, 1 | Stats 

23  IF  Stats  THEN  Cachflag«l 

24  F1oatf1ag*0 

25  ON  ERROR  GOTO  Redo 

26  CONTROL  32,2$ 1  !  IF  FLOATING  POINT  CARD  PRESENT  IT  MILL  BE  UTILIZED 

27  OFF  ERROR 

28  STATUS  32, 2$ Stats 

29  IF  Stats  THEN  FloatflagM 

30  ! 

31  Redos  !  OBTAIN  INPUT  DRTA 

32  LOOP 

33  IF  Ndim-0  THEN 

34  Ndi 16 

35  ELSE 

36  Ndi »«Ndi m+Tme 

37  END  IF 

36  REPEAT 

39  PRINT  "ENTER  TOTAL  NUMBER  OF  ELEMENTS/GROUPS  IN  ARRAYS  <3-50)  CfcVAL*< 
Ndi m)t" 3 " 

40  INPUT  Ndi m 

41  UNTIL  Ndi m>2  AND  Ndim<51 

42  ! 

43  OUTPUT  2  USING  "•,B"|255,75  I  CLEAR  SCREEN 

44  REPEAT 

45  PRINT  "ENTER  NUMBER  OF  SENSORS  IN  EACH  GROUPS  [ "tVAL*<Nogroup)«," 3 " 

46  INPUT  Nogroup 

47  UNTIL  Nogroup<26 

48  IF  Nogroup«0  THEN  Nogroup"! 

49  ! 

50  IF  NogroupOl  THEN 

51  REDIM  Xgroup<Nogroup) 

52  OUTPUT  2  USING  "#,B"$255,75  !  CLEAR  SCREEN 

53  REPEAT 

54  Group_space*»“" 

55  INPUT  "IS  ELEMENT  SPACING  MITHIN  THE  GROUP  CONSTANT?  <Y/N>  CY3",Grou 

p_space* 

56  IF  LEN<Group_space*)-0  THEN 

57  Group  space*»"Y" 

58  ELSE 

59  Group  space#"UPC*<Group_spaee*E 1 3 ) 

60  END  IF 

61  UNTIL  Group_space$«"Y*  OR  Group_space*-"N" 


B-3 


-21- 


62  ! 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85  ! 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97  ! 

98 

99 

100 
101 
102 

103 
3“ 

104 

105 

106  ! 

107 

108 

109 
<D/C> 

110 
111 
112 

113 

114 

115 

116 

117 

118  I 

119 

120 
121 


IF  Group_space*«"N"  THEN 
REPEAT 
H-0 

PRINT  “ENTER  POSITIONS  OF  SENSORS  IN  GROUP: “ 

FOR  I-l  TO  Nogroup 

PRINT  "SENSOR  #"t«VAL*<  I >t" : " 

INPUT  Xgroup( I ) 

IF  I>1  AND  Xgroup( I XXgroupC 1-1)  THEN  H-H+l 
NEXT  I 
UNTIL  H-0 
ELSE 
REPEAT 

PRINT  "ENTER  SPACING  BETWEEN  SENSORS  IN  GROUP:  t "fcVAL* < Dgroup)t " 3 " 
INPUT  Dgroup 
UNTIL  Dgroup>0 
FOR  1-1  TO  Nogroup 

Xgroup< I >-( 1-1 )*Dgroup 
NEXT  I 
END  IF 
ELSE 

MAT  Grres-  <  1 .  > 

END  IF 

OUTPUT  2  USING  "#,B";255,75  f  CLEAR  SCREEN 

REPEAT 

Hyd sen*-" M 

PRINT  "DO  YOU  WISH  TO  INCORPORATE  A  HYDROPHONE  SENSITIVITY?  <Y/N>  IN3" 
INPUT  Hydsen* 

IF  LEN( Hydsen* )-0  THEN 
Hydsen*-“N" 

ELSE 

Hydsen*-UPC* ( Hydsen* I 1 3  > 

END  IF 

UNTIL  Hydsen*«"Y“  OR  Hydsen*-“N" 

IF  Hydsen*-“N"  THEN 
MAT  Hsens-  <1. ) 

ELSE 

OUTPUT  2  USING  “ « , B“ j 255, 75  •  CLEAR  SCREEN 

REPEAT 

PRINT  "ENTER  THE  PHYSICAL  SENSOR  LENGTH:  (METERS)  C ”&VAL*(Sens 1 en)t" 

INPUT  Sens  ten 
UNTIL  Sens)en>0. 

REPEAT 

Hydro*-" " 

PRINT  “IS  HYDROPHONE  TO  BE  MODELED  AS  A  DIPOLE  OR  CONTINUOUS  SENSOR? 

IC3  " 

INPUT  Hydro* 

IF  LEN(Hydro*)-0  THEN 
Hydro*«"C" 

ELSE 

Hydro*-UPC*(Hydro*C 1 3  > 

END  IF 

UNTIL  Hydro*-"C"  OR  Hydro*«"D" 

END  IF 

OUTPUT  2  USING  "#,B"|255,75  I  CLEAR  SCREEN 

REPEAT 

PRINT  "ENTER  TOTAL  NUMBER  OF  MISSING  ELEMENTS/ GROUPS  I  "*.VAL*<Ti»e)«."  3 


122  INPUT  T»e 

123  UNTIL  Tme>-0  AND  Ndi»-Tne>2 

124  I 


B-4 


-22- 


125  IF  Tme>0  THEN 

126  REDIM  Misel(Tine) 

127  REPEAT 

128  OUTPUT  2  USING  "#,B"j255,75  !  CLEAR  SCREEN 

129  PRINT  “ENTER  MISSING  ELEMENT/GROUP  NUMBERS  (SEPARATED  BY  COMMAS)  C  “ 
I  Mi *el («) j “  3«" 

130  INPUT  Mi  set (•) 

131  MAT  SORT  Misel(*) 

132  H-8 

133  FOR  1=1  TO  Tme 

134  IF  Mi  set ( I >< 1  OR  Mi sel < I > >Ndi m  THEN  H=H+1 

135  IF  I>1  THEN 

136  IF  Misel < I >=Mi *e 1 < 1-1 >  THEN  H=H+1 

137  END  IF 

138  NEXT  I 

139  UNTIL  H=0 

140  END  IF 

141  ! 

142  OUTPUT  2  USING  “#,B"}255,75  !  CLEAR  SCREEN 

143  REPEAT 

144  INPUT  "ARE  ALL  ELEMENTS /GROUPS  EQUISPACED?  (Y/N)  [Y3“,Equi* 

145  IF  LEN(Equi *)=0  THEN 

146  Equi *  =  “Y" 

147  ELSE 

148  Equi*=UPC*(Equi*C13) 

149  END  IF 

150  UNTIL  Equi ♦  =  "  Y"  OR  Equi*=“N“ 

151  ! 

152  REDIM  D(Ndim-Tme) 

153  New_kos  ! 

154  Symflag=0  !  FLAG  FOR  ARRAY  SYMMETRY 

155  IF  Equi *  =  "  Y“  THEN  !  EQUISPACED  ARRAY 

156  OUTPUT  2  USING  "*,B"j255,?5  !  CLEAR  SCREEN 

157  IF  SI  1=0  THEN  SI  1=30 

158  REPEAT 

159  PRINT  "ENTER  ORIGINAL  SIDELOBE  LEVEL  <DB>:  (0  TO  50))  C - " t VAL* ( S 1 1 ) £. 
"3" 

160  INPUT  Sll* 

161  IF  LEN (SI  1  $ ) <  >0  THEN  S 11 =ABS< VAL<S 11  * ) > 

162  UNTIL  SI  1 >-l  AND  S!l<51 

163  ! 

164  OUTPUT  2  USING  "*,B";255,75  !  CLEAR  SCREEN 

165  REPEAT 

166  PRINT  "ENTER  ELEMENT/GROUP  SPACING  (METERS)  (0-15)  C  VAL*  ( Spac  e )  t,  " 
3" 

167  INPUT  Space 

168  UNTIL  Space>0  AND  Space<>15 

169  ! 

170  N=Ndim-l 

171  R=10MS1  1/20)  !  CALCULATE  K0 

172  R2“R*R 

173  R3=SQR(R2-1. ) 

174  R5=(R+R3)-(1./N) 

175  R6=<R-R3)M1./N> 

176  Zo*(R5+R6>/2. 

177  Ko=(2./Space)*ACS(l./2o> 

178  Kl=2. ePI/Space-Ko 

179  ! 

180  IF  Neuko*=“Y"  THEN 

181  OUTPUT  2  USING  “*,B"j255,?5  1  CLEAR  SCREEN 

182  REPEAT 

183  PRINT  “ENTER  K0«  C  "tVAL*(.<o>l."  3" 

184  PRINT  “SUGGESTED  VALUE  IS  i “ | PROUND<Ko, -4 ) 

185  INPUT  Ko 

186  Kl=2.*PI/Space-Ko 

187  IF  Hyd*en$="Y"  OR  Nogroup>l  THEN 


B-5 


-23- 


188  PRINT  “ENTER  i  s  £  “t.VAL*<Kl  >1"  1" 

189  INPUT  K1 

190  END  IF 

191  UNTIL  Ko>0  AND  KoCPl-'Spmce  AND  Ko<Kl 

192  Ndl  m-Ndl  *i+T«ie 

193  END  IF 

194  I 

195  C5-0 

198  FOR  1-1  TO  Ndl n 

197  IF  T«e>0  THEN 

198  FOR  J-l  TO  Tn« 

199  IF  I-Ml **I <J>  THEN  204 

200  NEXT  J 

201  END  IF 

202  C5-C5+1 

203  D<C5)-Sp*cc*< I-l  > 

204  NEXT  I 

205  CALL  Sv>md<Ndlm-Tai«,SyMriAglD<*>> 

200  ELSE 

207  PRINT  “ENTER  ELEMENT'GROUP  POSITIONS  (METERS  FROM  END)  s* 

208  PRINT  “SKIP  MISSING  ELEMENT ✓GROUP  POSITIONS. * 

209  IF  N«wko*«”Y“  THEN  Ndi»-Ndi»+T«se 

210  FOR  I-l  TO  Ndim-Tme 

211  REPEAT 

212  H-0 

213  PRINT  "ELEMENT/GROUP  “fcVAL*  <  I  >*,”  £  “fcVALFC  D<  I  )  >t  “  3 

214  INPUT  DC  I > 

215  IF  I>1  THEN 

216  IF  D< I ><D< I-l  )  THEN  H-H+l 

217  END  IF 

218  UNTIL  H-0 

219  PRINT  D( I ) 

220  NEXT  I 

221  OUTPUT  2  USING  “It,  B“  j  255, 75  !  CLFAR  SCREEN 

222  REPEAT 

223  INPUT  “ENTER  KOc  (RAD-'METER)  '  ,Ko 

224  INPUT  “ENTER  Kli  (RAD^METER) “ , K1 

225  UNTIL  K 1  >Ko  AND  Ko>*=0 

226  CALL  SyndCNdi m-Tme , Sy»f  1 *g, t <* ) > 

227  END  IF 

228  ! 

229  Ndi M-Ndi >-Tme 

230  Ndl Wpl-Ndi «♦ 1 

231  Ndl  *»p4-Ndl  n+4 

232  IF  Equl *-“Y“  THEN 

233  H-64 

234  C3-(Kl-Ko>'<2. *M-1 . ) 

235  ELSE 

236  M-12B 

237  C3-(Kl-Ko)/(M-l. > 

230  END  IF 

239  Logp-5 

240  Pl-(2-Logp)+l 

241  1 

242  OUTPUT  2  USING  “#,B"|235,75  I  CLEAR  SCREEN 

243  REPEAT 

244  N-gw*t#-“" 

243  INPUT  "HILL  YOU  ALLOW  NEGATIVE  WEIGHTS  ?  (Y'N)  £Y3“,N*gw*t* 

246  IF  LEN<N«gw«t*>-0  THEN 

247  N*gu*t*-"Y“ 

248  ELSE 

249  N«gu«T*-UPC*(N«gw«t*C13> 

230  END  IF 

231  UNTIL  N*gw«t*-“Y“  OR  N*gw»t*-“N" 

232  IF  N»gu*t$»“Y“  THEN  I  NO  CONSTRAINTS!  (HJX-l 

233  L-0 


B-6 


24- 


1  CONSTRAINT :  < SUM(W j >-. 5> < =. 5, < W j-. 5> < =. 5 
CHANGE  L  AND  REDIM  APPROPRIATE  ARRAYS 
FOR  MORE  CONSTRAINTS 


254  ELSE 

255  1=1 

256  END  IF 

25?  Ldi m*MAX( 1 ,  L> 

258  ! 

259  !  REDIMENSION  INPUT  ARRAYS 

260  ! 

261  REDIM  I oexi t <  Logp  ) , Ijsut(3,Ndimpl), Itlog(Logp), I  count  (Ndim), Zradi i (Ndim) 

262  REDIM  Bradi i (Ldi m> , Cheb(Logp) , Estore(Ndi m),Z(Ndim),Zcentr(Ndim) 

263  REDIM  Bas i nu(Ndi mpl , Ndi mp4) ,  Cossin(2, PI), Rea(M, Ndim), Ima(M, Ndim) 

264  REDIM  Ref(M), Imf(M),Reb(Ldim,Ndim), Imb(Ldim,Ndim),Rebcentr(Ldim) 

265  REDIM  Imbcent r (Ldi m),D(Ndim),U(M),Tbeam(2*M),Grres(M),Hsens(M) 

266  REDIM  Co) rea(Ndi m) , Col i Ma(Ndi m) , Co  1 reb(Ndi m) , Col i mb(Ndi m) 

267  > 

268  MAT  Bas 1  no*  (0.)  !  INITIALIZE  COMMONS 

269  MAT  Cossin*  (0.) 

270  MAT  Z=  (0.) 

271  MAT  Cheb=  (0. > 

272  MAT  Col rea=  (0.) 

273  MAT  Col i ma=  (0. ) 

274  MAT  Colreb=  (0.) 

275  MAT  Col imb*  (0. ) 

276  ! 

27?  IF  Neguet*=“Y"  THEN 

278  MAT  Reb=  (0. ) 

279  MAT  I mb=  (0.) 

280  MAT  Rebcentr=  (0, ) 

281  MAT  Imbcentr=  (0. ) 

282  MAT  Bradi i =  (0.) 

283  MAT  Zcantr*  (0. ) 

284  MAT  Zradi i =  (1. ) 

285  ELSE 

286  MAT  Reb=  (1.) 

287  MAT  Imb=  (0,) 

288  MAT  Rebcentr*  (.5) 

289  MAT  Imbcentr=  (0.) 

290  MAT  Bradi i«  (.5) 

291  MAT  Zcentr*  (.5) 

292  MAT  Zradi i=  (.5) 

293  END  IF 

294  ! 

295  FOR  1=1  TO  M 

296  U(I)*Ko+C3*(I-l)  !  GENERATE  U  ARRAY 

297  NEXT  I 

298  ! 

299  IF  Hydsen*="Y“  THEN  !  CALCULATE  SENSITIVITY  TERM 

300  MAT  Hs«ns=  (0. ) 

301  IF  Hydro*="D"  THEN  !  DIPOLE  SENSITIVITY 

302  FOR  J=1  TO  M 

303  Const*. 5+.5*C0S(U(J)*S«nsl*n> 

304  Hsens( J)*Const *Const 

305  Const*-. 5*SIN(U(J)*S*nsl*n> 

306  Hsens( J)*SQR(Hs«ns( J)+Const sConst ) 

30?  NEXT  J 

308  ELSE  !  CONTINUOUS  SENSITIVITY 

309  FOR  J*1  TO  M 

310  Hs*ns(  J  )=ABS(SIN(U(  J)*S*nsl  «n/"2 .  >/(U(  J)*S*nsl  •n/'2.  )  > 

311  NEXT  J 

312  END  IF 

313  END  IF 

314  ! 

315  IF  NogroupOl  THEN  I  CALCULATE  GROUP  RESPONSE 

316  MAT  Grres*  (0. > 

317  FOR  J-l  TO  M 

318  Gr 1 m-0. 

319  FOR  1*1  TO  Nogroup 


B-7 


-25 


320 

321 

322 

323 

324 

325 

326 

327 

328 

329 

330 

331 

332 

333 

334 

335 

336 

337 

338 

339 

340 

341 

342 

343 

344 

345 

346 
34? 

348 

349 

350 

351 

352 

353 

354 

355 

356 

357 

358 

359 

360 

361 

362 

363 

364 

365 

366 

367 

368 

369 
378 

371 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

384 

385 


Grres< J)=Grres< J)+COS<U< J)*Xgroup< 1 ) > 
Grim=Grim-SIN<U<J>*Xgroup<I>> 

NEXT  I 

Grres<J>=SQR<GrresCJ>*Grres<J>+Grim*Gri m)/'NogrOup 
NEXT  J 
END  IF 

FOR  J-l  TO  M 

R#f<J>=COS<D<Ndii»>*U<J>>  1  GENERATE  F  ARRAY 

I»f  <  J>=-SlN<D<Ndim>*U<J>> 

FOR  1  =  1  TO  Ndim 

Rea<J,  I>=Ref  < J>-COS<D< I >*U< J> >  GENERATE  Hk  ARRAY 
Ima<J, I>=I»f<J>+SIN<D<I>*U<J>> 

Rea<J, I >=R»«( J, I >#Grr#*< J>*Hs«ns< J> 

I>*1»*<J, I)#Grr€S<J)*H*«n*<J> 

NEXT  I 
NEXT  J 

FOR  1=1  TO  M 

Ref <1 >»R«f < I >«Grres< I >»Hsen*<l ) 

Imf < I >=Imf < I >*Grres< I >*Hs*nt< I > 

NEXT  I 

N=Ndi m-1 

It  1 og( 1 )=20»N  ! MAX  ITERATION  COUNT 

lo*xit<l>=0  IPRINT  OPTION 

Ts=TIMEDATE  ! INITIALIZE  TIME 

CALL  KaproxCN, It log<*>, loexi t<*>, I jsut  <*> ) 

Te=TIMEDATE-Ts  ! EXECUTION  TIME 

Estore<N)=Cheb<Logp> 

I count  <N>*It 1 og<Logp> 

2sum=0.  !  CALCULATE  FINAL  WEIGHT=1-SUM  OF  ALL  OTHERS 

2*um=SUMC2> 

2<Ndi  *0  =  1 . -Zsum 

IF  Symf 1 *g  THEN  !  SYMMETRI2E  WEIGHTS 

FOR  1  =  1  TO  INT<  <Ndi X2) 

P=<ZU>  +  Z<Ndl«<-I*l>>'2 
2CI>=P 

2<Ndi*-I-H)=P 
NEXT  I 
END  IF 

IF  Ti»e>0  THEN 

REDIM  2<Nd* •♦T«») 

FOR  1=1  TO  T»* 

FOR  J»Nd1iM-I  TO  NitlKI)  STEP  -1 
IF  J>Hiul(I>  THEN  Z< J>=Z< J-l  ) 

NEXT  J 

Z<Mis*l <I>>=0. 

NEXT  I 
END  IF 

IF  Equt*«"Y“  THEN 

CALL  C»1cb«»»<Nd1«ifM,Tme,Mi»*l <*> , Sp*e« , Z<*> , Tbt«m<*> > 

else  , 

CALL  Un*y»b*»»»<Ndl  m,  M,  Tme ,Mi*tl<*>,Ko,Kt,Z<*>, Tbe*m<*?  > 

END  IF 

Z**x=MAX<Z<*>> 

IF  ZnaxOB  THEN 

FOR  1  =  1  TO  Ndi»«-T«>*  I  NORMALIZE  WEIGHTS  TO  1 

Z< I >=Z< I )^Z»*x 
NEXT  I 
END  IF 


B-8 


-26- 


366  ! 

38?  CALL  Wei  ght_pl  ot  (Ndi  m  +  Tme  ,  l*W,  2,  Tme,  M  i  se  1  <*>, Space, Ko, K1 , C3, 2<* >, Tbeam< * 

>,£qui*) 

388  OUTPUT  2  USING  “#,B";255,75 

389  PRINTER  IS  PRT 

398  PRINT  USING  “9“ 

391  DUMP  GRAPHICS 

392  ! 

393  CONTROL  CRT, 12; 2 

394  FOR  1-8  TO  9 

395  ON  KEY  I  LABEL  ""  GOSUB  Dummy 

396  NEXT  I 

397  ON  KEY  1  LABEL  “  CONTINUE  "  GOTO  Comp 

398  ON  KEY  2  LABEL  “  KEYS  OFF/ON  “  GOSUB  Flip  key 

399  LOOP 

488  END  LOOP 

481  Dummy;  I 

482  RETURN 

483  F1ip_key;  ! 

484  Keflip-<Kefl  ip+l>  MOD  2 

485  IF  Kef  1 i p  THEN 

486  CONTROL  CRT, 12; 1 

487  ELSE 

488  CONTROL  CRT,12;2 

489  END  IF 

418  RETURN 

411  Comp;  ! 

412  GRAPHICS  OFF 

413  OFF  KEY 

414  Weight*-"" 

415  REPEAT 

416  INPUT  “WOULD  YOU  LIKE  A  LIST  OF  THE  WEIGHTS?  <Y/N>  CY3", Weight* 

417  IF  LEN<Weight*)-8  THEN 

418  Wei ght*-"Y" 

419  ELSE 

428  Wei ght*-UPC*CWei ght*C 13) 

421  END  IF 

422  UNTIL  Weight*-“Y“  OR  Weight*-"N" 

423  ! 

424  CALL  Pr i nt i nput sCCachf 1  mg, FI oat f 1 ag, Ndi m, Tme, S1 1 , It  1 og(*> , Logp, Mi sel <*> , 
Sp ace,Ko,Kl,Te,Cheb(*>, 2(0, Equi*, Weight*, Neg wet*) 

425  ! 

426  OUTPUT  2  USING  "#,B";255,75  !  CLEAR  SCREEN 

427  Neuko*-"" 

428  REPEAT 

429  INPUT  "WOULD  YOU  LIKE  TO  CALCULATE  A  NEW  Ko  or  K1  TO  GIVE  A  DIFF.  BEAM 
WIDTH  <Y'N)  CNI", Neuko* 

438  IF  LEN<Neuko*)-8  THEN 

431  Neuko*-“N" 

432  ELSE 

433  Neuko*«UPC*<Neuko*CU) 

434  END  IF 

435  UNTIL  Newko*-"Y"  OR  Neuko*-"N" 

436  IF  Neuko*-" Y"  THEN  GOTO  New  ko 

437  I 

438  Wgtstore*-" " 

439  REPEAT 

448  INPUT  "WOULD  YOU  LIKE  TO  STORE  THE  WEIGHTS  IN  A  DATA  FILE?  <Y'N>  CN3", 

Wgtstore* 

441  IF  LEN< Wgtstore*) -8  THEN 

442  Wgtstore*-"N" 

443  ELSE 

444  Wgtstore*-UPC*(Wgtstore*C II ) 

445  END  IF 

446  UNTIL  Wgtstore*-"Y“  OR  Wgtstor#*-"N" 

447  IF  Wgtstore*-"Y"  THEN 


B-9 


-27- 


OUTPUT  2  USING  "#, B“ ( 255, 75  !  CLEAR  SCREEN 

INPUT  “ENTER  FILENAME  FOR  WEIGHT  FILEi  <10  CHARACTERS) ", F i 1 enaae* 

OUTPUT  2  USING  “*,B“j255,75  !  CLEAR  SCREEN 

INPUT  “ENTER  MASS  STORAGE  DEVICES  t  s INTERNAL  3 " , Data_m*us* 

IF  Equ1*«“Y“  THEN 
IF  Ndi*>38  THEN 

CREATE  BOAT  FilinmHIItu  asus*, 2, 256 
ELSE 

CREATE  BOAT  minutHBui  mu**,  1,256 
END  IF 

ASSIGN  OStordat  TO  FIltnutMItti  nsus$ 

OUTPUT  8Stordat | Ndi *+T*s, Space, Z<* > 

ELSE 

SELECT  Ndi* 

CASE  >47 

CREATE  BOAT  Fn*na*e«lData  asus*,4,256 
CASE  >31 

CREATE  BOAT  FIlenaaeStData  asus*,3,256 
CASE  >15 

CREATE  BOAT  F i 1 •naaaStOat a  asus*, 2, 256 
CASE  ELSE 

CREATE  BOAT  F i 1 cnaacStDat  a  asus*, 1,256 
END  SELECT 

ASSIGN  BStordat  TO  F11 enaaeStData  asus* 

OUTPUT  •Stord*tiNd1*+T*«,D<*),Z(*7 
END  IF 

ASSIGN  SStordat  TO  • 

END  IF 
GRAPHICS  ON 
PAUSE 

GRAPHICS  OFF 

END  LOOP  •  RETURNS  TO  Redo  AT  PROGRAM  BEGINNING 

END 


B-10 


446 

449 

450 

451 

452 

453 

454 

455 

456 

457 

458 

459 

460 

461 

462 

463 

464 

465 

466 

467 

468 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

480 

481 


-28- 


A  General  Chebyshev  Complex  Function 
Approximation  Procedure  And  An 
Application  to  Beamforming 


R.  L.  Streit  and  A.  H.  Nuttall 


-29- 


A  general  Chebyshev  complex  function  approximation  procedure 
and  an  application  to  beamforming 

R.  L.  Streit  and  A.  H.  Nuttall 

Naval  Underwater  Systems  Center,  New  London  Laboratory,  New  London.  Connecticut  06320 

(Received  22  December  1981;  accepted  for  publication  24  March  1982) 

A  new  computational  technique  is  described  for  the  Chebyshev,  or  minimax,  approximation  of  a 
given  complex  valued  function  by  means  of  linear  combinations  of  given  complex  valued  basis 
functions.  The  domain  of  definition  of  all  functions  can  be  any  finite  set  whatever.  Neither  the 
basis  functions  nor  the  function  approximated  need  satisfy  any  special  hypotheses  beyond  the 
requirement  that  they  be  defined  on  a  common  domain.  Theoretical  upper  and  lower  bounds  on 
the  accuracy  of  the  computed  Chebyshev  error  are  derived.  These  bounds  permit  both  a  priori  and 
a  posteriori  error  assessments.  Efforts  to  extend  the  method  to  functions  whose  domain  of 
definition  is  a  continuum  are  discussed.  An  application  is  presented  involving  "re-shading"  a  50- 
element  antenna  array  to  minimize  the  effects  of  a  10%  element  failure  rate,  while  maintaining 
full  steering  capability  and  mainlobe  beamwidth. 

PACS  numbers:  43  60.Gk,  43.30. Vh 


LIST  OF  SYM80LS 


/ 

ft, . h. 


(O . .  .  . 


£.</) 


a 


C 

P 


& . 

RJz.a) 
L  (*;«) 


the  given  complex  valued  function  to  be 
approximated 

the  given  basis  functions;  linear  combi¬ 
nations  of  these  functions  are  used  to  ap¬ 
proximate  / 

the  given  finite  point  set;  approxima¬ 
tions  to  /  are  constructed  on  the  m  ele¬ 
ments  of  this  set.  (Ordinarily,  Qm  is  a  set 
of  complex  numbers;  however,  Qm  can 
be  any  finite  set  on  which  /  and  hk  are 
defined.) 

the  elements  of  Qm 

any  vector  of  complex  numbers  used  as 
coefficients  of  the  basis  functions 

A . A. 

the  complex  error  “curve”  of  the  ap¬ 
proximation  to/alforded  by  the  coeffi¬ 
cient  vector  a;  defined  by  Eq.  (Al) 
the  actual  maximum  magnitude  error 
committed  by  the  “best”  (i.e.,  Cheby¬ 
shev,  or  minimax)  approximation  to  f 
by  linear  combinations  of  A,,...,A„ ;  de¬ 
fined  by  Eq.  (A2) 

any  coefficient  vector  for  which  Em  ( /)  is 
actually  attained;  a  is  itself  approximat¬ 
ed  by  a  (see  below) 

the  usual  vector  space  of  all  n  tuples  of 
complex  numbers 

a  given  integer  greater  than  or  equal  to 
2;  the  larger  p,  the  better  will  be  the  ap¬ 
proximation  of  a  by  a  (see  Theorems  A 1 
and  A2) 

angles  defined  in  Lemma  A2  and  depen¬ 
dent  only  on  p 

the  real  part  of  the  error  curve  e„  (z;a) 
the  imaginary  part  of  the  error  curve 


G,(z;fl) 


M.Af  I 


a 


*,(/) 


u(z),u(z) 

'*W4*(Z) 

RJS 


bp** 


Bx=g 


Bx  =  g 


other 


the  projection  of  the  error  curve  e„  (z;o) 
onto  the  real  axis  of  the  complex  plane 
after  a  rotation  through  the  angle  0J ;  de¬ 
fined  by  Eq.  (A3) 

essentially  an  approximation  to  the 
number  £„(/);  defined  by  Eq.  (A4) 
any  coefficient  vector  for  which 
is  actually  attained;  a  is  essentially  an 
approximation  to  the  vector  a  (see 
above) 

the  maximum  magnitude  error  commit¬ 
ted  using  the  coefficient  vector  a;  de¬ 
fined  in  Theorem  A2 
the  real  and  imaginary  parts,  respective¬ 
ly,  off  {2) 

the  real  and  imaginary  parts,  respective¬ 
ly,  of  the  basis  function  ht  (z) 
real  mxn  matrices  whose  entries  in  the 
yth  row  and  klh  column  are  rt  |zy )  and 
sk  (z  j ),  respectively.  Used  to  construct 
matrix  B  (see  below) 
the  real  and  imaginary  parts,  respective¬ 
ly,  of  the  coefficient  ak  of  basis  function 
hk 

the  real  overdetermined  system  on  mp 
equations  in  2 n  unknowns,  whose  Che¬ 
byshev  solution  yields  a  solution  a  to  the 
problem  Mmp  ( /);  see  the  paragraph  con¬ 
taining  Eq.  (A6)  for  details  of  construc¬ 
tion 

analogous  to  Bx  =  g  when  the  solution 
vector  a  is  forced  to  be  a  vector  of  real 
numbers;  see  Eq.  (A7)  for  details 
all  notations  not  in  this  glossary  are  un¬ 
derstood  to  be  “local”;  that  is,  they  are 
used  only  in  the  context  of  the  particular 
paragraphs  which  contain  them 


181 


J  Acoust  Soc  Am  72(1).  July  1982 


INTRODUCTION 

The  approximation  of  desired  or  given  functional  be¬ 
havior  by  finite  sets  of  simpler  or  specified  basis  functions  is  a 
recurrent  problem  in  many  fields.  For  example,  in  the  math¬ 
ematical  field,  we  might  wish  to  approximate  a  (desired) 
complex  integral  by  a  set  of  (simpler)  sinusoidal  components. 
Or  in  an  antenna  array  processing  application,  we  often  want 
to  realize  a  (given)  low  side-lobe  behavior  by  means  of  an 
array  with  (specified)  element  locations  which  are  not  under 
our  control. 

For  the  case  where  the  given  functional  behavior  and 
the  specified  basis  functions  are  all  real  valued  and  defined 
on  a  finite  discrete  data  set,  and  where  the  approximation  is 
afforded  by  a  real-weighted  linear  combination  of  these  basis 
functions,  the  optimum  solution  for  minimizing  the  maxi¬ 
mum  magnitude  error,  i.e.,  the  Chebyshev  norm  is  in  very 
good  shape  due  to  a  fine  algorithm  given  in  Barrodale  and 
Phillips.1'1  Specifically,  this  algorithm  solves  the  following 
mathematical  problem:  given  real  constants  |/|,  \h:i  |, 
where  l<i<m,  l</c<n,  m>n,  the  real  quantities  |at  |"  are 
determined  that  minimize  the  maximum  absolute  value  of 
the  error  residuals 

m 

«,=/ -  Y  akh,k,  for  l</<m.  (1) 

k  _  i 

This  algorithm  has  recently  been  used  to  good  advantage  in 
an  array  processing  application  to  design  some  real  symmet¬ 
ric  weighting  functions  with  very  good  side-lobe  behavior, 
subject  to  constraints  on  the  rate  of  decay  of  the  distant  side 
lobes.’ 

Here  we  wish  to  employ  the  algorithm,  as  described 
above  for  real  variables  in  Eq.  ( 1 ),  for  the  minimization  of  the 
Chebyshev  norm  of 

«.(*;«)= /(z)  -  j  o./i.w,  (2) 

k  -  1 

when/(z)and  |  At(z)|"  are  complex,  and  z  can  take  values  in 
an  arbitrary  finite  discrete  point  set.  The  weighting  coeffi¬ 
cients  |o* );  may  be  complex,  or  alternatively,  they  may  be 
restricted  to  be  real.  Applications  are  afforded  by  an  antenna 
array  with  arbitrarily  specified  element  locations,  but  em¬ 
ploying  weights  that  are  restricted  to  be  real,  or  alternatively 
by  array  weights  that  are  also  allowed  to  be  phased  (com¬ 
plex).  Numerical  examples  and  applications  of  the  tech¬ 
nique,  some  efforts  attempted  for  extending  the  method  to  a 
continuum  of  values  of  z,  and  a  discussion  constitute  the  rest 
of  the  main  body  of  the  paper.  In  the  Appendix  the  basic 
mathematical  theory  and  algorithm  for  the  minimization  of 
Eq.  (2)  is  developed.  Streit  and  Nuttall4  present  a  fortran 
program  in  a  form  which  should  be  useful  to  readers  interest¬ 
ed  in  applying  the  technique  to  their  own  particular  applica¬ 
tions;  unfortunately  the  listing  is  too  long  to  include  here.  ( A 
brief  study  of  the  appendix,  especially  with  regard  to  Eq. 

( A6),  should  enable  interested  readers  to  write  their  own  pro¬ 
gram.  1 

Although  the  above  algorithm'  is  limited  to  a  discrete 
set  of  points,  it  has  been  used  fruitfully  to  minimize  the  con¬ 
tinuous  error  [Eq.  (2)]  over  a  real  variable  z  in  the  interval 
[z„,z»],  when / and  |  hk )  are  real,  in  the  following  manner. 

182  J.  Acouat  Soc  Am..  Vol.  7a.  No  1.  July  198a 


First,  an  initial  set  of  m>n  real  points  |z'  )7  was  specified 
and  the  Chebyshev  norm  minimized  in  the  usual  fashion, 
resulting  in  the  coefficient  set  |  uj ) " .  For  this  set  of  optimum 
coefficients,  the  locations  jzf)1,  of  the  largest  peaks  of 
|e„ (z; a)|  were  located,  by  setting  the  derivative  e'„  (2;a)  to  zero 
and  solving  numerically  for  jz?)',;  the  number  /  of  such 
peaks  will  generally  be  less  than  m,  but  larger  than  n.  [This 
approach  presumes  the  availability  of  computable  expres¬ 
sions  for /'(z)  and  |  h  j  |z)  ( J .]  Then  the  modified  set  of  points 
jzf  j',  were  used  for  another  Chebyshev  minimization,  re¬ 
sulting  in  coefficient  set  |  a  j ) " .  Repetition  of  this  procedure 
stabilized  after  a  few  trials  with  a  unique  set  of  |  z, ) '  at  which 
the  maximum  errors  were  equal  and  irreducible.  In  the  ex¬ 
amples  tried  in  Nuttall,’  the  number  of  peaks  l  at  which  the 
magnitude  error  |e.  (z;a)|  was  largest  and  equal  turned  out  to 
be  n  +  1.  Further  discussion  of  this  recursive  approach  is 
given  in  Sec.  II. 

Our  method,  as  presented  in  theAppendix,  is  not  inher¬ 
ently  restricted  to  arrays  of  any  particular  geometry,  but 
does  assume  that  interelement  effects  (mutual  coupling)  can 
be  ignored.  In  the  most  general  case  of  a  spatial  or  volume¬ 
tric  array,  the  method  proposed  here  can  still  be  applied.  All 
the  functions  in  Eq.  ( 1 )  are  then  functions  of  spherical  coordi¬ 
nates  [0,q>),  so  the  finite  domain  of  approximation  becomes 
an  appropriately  chosen  finite  set  ((0*  ,<pk  )|  instead  of  a  set 
of  complex  numbers.  This  difference  does  not  in  any  way 
affect  the  mathematical  properties  of  our  method;  rather  it 
affects  the  size  of  the  numerical  problem  to  be  solved  and 
consequently,  the  computer  effort  required  for  its  solution. 
For  large  enough  arrays,  such  effort  ultimately  becomes  pro¬ 
hibitive;  where  that  point  lies  depends  upon  the  designer  and 
the  application. 

Although  our  method  is  applied  only  to  single-frequen¬ 
cy  design  problems  for  arrays,  it  can  also  be  applied  to  broad¬ 
band  frequency  design  by  sampling  in  frequency  space  as 
well.  This  again  adds  to  the  computer  effort  of  solution,  but 
does  not  affect  the  basic  mathematical  method. 

We  use  no  weighting  function  in  Eq.  (2),  and  so  the 
resulting  farfietd  beam  patterns  have  a  level  side-lobe  struc¬ 
ture.  For  example,  the  classical  Dolph-Chebyshev  array  de¬ 
sign  can  be  reproduced  by  our  method.  If  such  a  level  side- 
lobe  structure  is  not  desired,  then  use  of  an  appropriate 
weighting  function  in  Eq.  (2)  is  easily  incorporated  into  our 
method  without  altering  the  algorithm  in  any  essential  way. 

I.  APPLICATION  TO  ARRAY  DESIGN  WITH  A 
CONSTRAINT 

Consider  a  linear  antenna  array  with  N  elements,  locat¬ 
ed  at  arbitrary  fixed  positions  j xK ) ",  receiving  a  plane-wave 
arrival  of  wavelength  A  from  direction  0„ ,  —  ir/2<0„  <ir/2, 
relative  to  a  normal  to  the  array.  If  the  array  is  steered  to 
look  in  direction  6,,  —  ir/2<0;  <ir/2,  then  the  complex 
transfer  function  of  the  beamformer  is  given  by 

T\u)=  V  wk  exp(  -  id,  u),  (3) 

*  -  I 

where  |  wk  |  *  are  the  element  weights,  and 

R.  L  Streil  and  A  H.  Nuttall:  Chebyshev  approximation  182 


-32- 


dk  =  2irxk// 1,  for  1  <,k<,N, 
u  =  sin  0,  —  sin  0,. 

Observe  that  the  total  range  of  u  depends  on  the  look  direc¬ 
tion  0i ;  for  example,  if  0,  =  0,  then  the  range  of  u  is  the 
closed  interval  (  —  1.1].  The  peak  response  of  T[u\  should 
occur  at  u  =  0,  so  we  normalize  (without  loss  of  generality) 
according  to 

r(0)=  i  =  £  w„. 

k  -  I 

To  realize  small  side  lobes,  we  must  minimize  |  T{u)  |  for 
all  u  values  in  some  subset  U  of  the  total  range  of  u ■  For 
example,  if  0,  —  0,  the  total  range  of  u  is  [  —  1,1],  and  U 
could  be  the  union  of  intervals  (  —  1,  —  u0]  and  [u„,  1  ],  where 
u0  >  0  is  chosen  small  relative  to  1 .  For  the  special  case  of  real 
weights,  since  from  Eq.  (3),  T{  —  u)  =  7'*(u),  we  could  con¬ 
fine  attention  to  V  =  [u„,l]  The  normalization  constraint  is 
most  easily  accounted  for  by  solving  for  wN  and  eliminating 
it;  we  obtain  then 
7»=  exp(  -id„u) 

-  f  wk  [exp<  -  idsu)  -  exp(  -  idku)].  (4) 

kT  I 

This  problem  now  fits  the  framework  of  Eq.  (Al)  in  the  ap¬ 
pendix  if  we  identify 

2  =  u,  n  =  A'  —  1,  e.lzja)  =  T(u), 
f{2)  =  exp(  -  tdyU),  ak=wk. 

A,(z)  =  exp(  —  idy  u )  —  exp(  —  idk  u), 

Qm  =  finite  subset  of  U.  (5) 

There  has  been  no  statement  thus  far  as  to  the  real  or 
complex  nature  of  the  weights  j  wk  | .  This  distinction  de¬ 
pends  upon  the  application  and  the  capability  of  the  beam- 
former.  Both  cases  fit  the  above  framework;  the  only  differ¬ 
ence  is  that  the  number  of  unknowns  to  be  solved  for  will  be 
twice  as  large  for  the  complex  weights  as  for  the  real  weights. 

If  the  array  is  half-wavelength  equispaced,  then  the 
computed  element  weights  will  be  identical  to  the  classical 
Dolph-Chebyshev  weights  and  can.  in  this  instance,  be  com¬ 
puted  analytically.  The  general  case  of  arbitrary  spacings, 
however,  cannot  be  computed  analytically;  yet  the  algo¬ 
rithm  presented  in  this  paper  can  always  be  applied. 

In  the  remainder  of  this  section,  we  presume  that  the 
elements  are  equispaced  at  half-wavelength.  Then  xK  =  kA.  / 
2  and  Eq.  (3)  becomes 

N 

7»  =  £  wk  exp(  —  iirku).  (6) 

*  -  i 

Observe  now  that  7"(u)  in  Eq.  (6)  has  period  2  in  u,  regardless 
of  whether  the  weights  |  wk  j  are  real  or  complex,  or  whether 
some  elements  have  failed,  i.e.,  zero  weight  values.  This 
means  that  we  can  study  and  control  r(u)  in  Eq.  (6)  over  any 
convenient  u  interval  of  length  2,  and  need  not  confine  our 
investigation  to  [  —  1 , 1  ].  In  particular,  we  concentrate  on  the 
u  interval  [0,2]  in  the  following. 

As  an  illustration  of  the  capability  of  the  minimization 
technique  of  this  paper,  a  50-element,  half-wavelength,  equi¬ 
spaced  linear  array  was  initially  designed  for  peak  side  lobes 

183  J.  Acoust.  Soe.  Am.,  Vd.  72,  No.  1,  July  1982 


FIG  1  Relative  pattern  for  five  elements  failed 

of  —  30  dB  relative  to  the  main  peak.  This  is  of  course  a 
standard  Dolph-Chebyshev  case,  and  gives  —  30-dB  side 
lobes  throughout  the  u  range  [uq.2  —  u0],  where 
tr0  —  0.05381 17. 3  Then  10%  of  the  elements  were  randomly 
eliminated  from  the  array,  but  the  remaining  weights  were 
unchanged;  this  corresponds  to  five  elements  failing  in  the 
array.  The  relative  response  of  this  particular  array,  with 
elements  7,  22,  40, 43,  50  failed,  is  illustrated  in  Fig.  1.  The 
peak  side  lobe  has  increased  from  —  30  to  —  21.58  dB,  a 
degradation  of  8.4  dB,  and  there  is  a  large  variety  of  different 
size  peaks. 

When  our  method  with  p  =  2  and  m  =  25 1  equispaced 
points  in  [u0,2  —  u0]  is  applied  to  this  defective  array  and  the 
remaining  45  elements  are  weighted  with  real  coefficients, 
subject  to  the  constraints  that  the  mainlobc  width  be  the 
same  as  the  ideal  50-element  array  and  that  the  steering 
range  in  u  be  the  same,  the  resultant  array  pattern  is  as  dis¬ 
played  in  Fig.  2.  The  peak  side  lobe  is  now  —  23.62  dB,  an 
improvement  of  2.04  dB  over  Fig.  1 ;  however,  there  is  still  a 
significant  variation  in  the  values  of  the  side  lobes  due  to  an 
insufficient  number  of  phase  controls,  namely  only  p=2. 

The  best  real  weights  resulting  from  an  increase  in  the 
parameter  values  top  =  8,m  =  501  are  displayed  graphical¬ 
ly  in  Fig.  3,  and  the  corresponding  array  pattern  is  given  in 
Fig.  4.  The  gaps  in  Fig.  3  at  locations  7,  22,  40, 43,  and  50 
correspond  to  zero  weighting  at  the  failed  elements.  The  gen¬ 
eral  character  of  the  weights  is  a  bell-shaped  one  of  all  posi- 


—  "~l 

JVV 

fV^J 

Wl/' 

VV\A 

uv\ 

f 

i~ 

»  5  H  l  tJI  IS  ITS  j 

«>8*n«a- «nl| 


FIG  2.  Relative  pattern  for />  =  2,  m  =  251,  real  weights. 

R.  L.  Strait  and  A.  H.  Nuttall.  Chobyahov  approximation  183 


-33- 


FIG  3  Best  real  weights  for  />  =  8,  m  =  501 

tive  numbers,  but  there  is  significant  fluctuation  in  the  actual 
weight  values,  of  the  order  of  10%.  The  pattern  in  Fig.  4  has 
a  peak  side  lobe  of  —  25.20  dB,  an  improvement  of  3.62  dB 
over  Fig.  1  but  still  4.80  dB  poorer  than  the  ideal  30-element 
array. 

When  the  weights  were  allowed  to  be  complex  and  the 
maximum  side  lobe  minimized  in  the  same  steering  range 
[uo.2  -  uj  for  p  =  2  and  m  =  501  equispaced  points  in 
[«,,, 2  —  ad,  the  best  complex  weights  turned  out  to  be  virtu¬ 
ally  pure  real,  and  the  corresponding  pattern  was  almost 
identical  to  Fig.  2.  A  much  improved  pattern  for  complex 
weights  was  achieved  when  we  took  p  =  8,  m  =  501 ;  in  fact, 
the  best  complex  weights  were  real  (within  10~‘  relative  er¬ 
ror)  and  the  pattern  was  the  same  as  Fig.  4.  Although  we  had 
anticipated  a  better  pattern  for  the  complex  weight  case  than 
for  the  real  weights,  that  did  not  materialize;  the  best  com¬ 
plex  weights  for  this  equispaced  linear  array  with  five  miss¬ 
ing  elements  were  real.  The  reason  for  this  behavior  is  un¬ 
known,  but  it  is  an  encouraging  result  from  the  array  design 
viewpoint,  for  it  indicates  that  there  is  no  need  to  allow  phas¬ 
ing  at  the  individual  elements;  gain  alone  will  achieve  all  the 
side-lobe  reduction  that  can  be  achieved.  This  conclusion  is 
drawn  only  for  the  half-wavelength  equispaced  line  array 
with  omnidirectional  element  response.  (Recently,  Lewis 
and  Streit®  proved,  for  a  general  line  array  steered  through 
the  same  number  of  degrees  either  side  of  broadside,  that 
within  the  collection  of  all  sets  of  best  complex  weights  there 


0  !»  I  t  IS  J 

u  -  —  -  —  », 


FIG  4  Relative  pattern  for  p  =-  8.  m  =  501. 

184  J.  Aeou»t  Soc.  Am .  Vol.  72.  No.  1.  July  1982 


always  exists  a  set  of  real  weights.  Thus  it  is  not  necessary  to 
use  complex  weights  in  the  case  of  line  arrays  to  achieve  best 
possible  side-lobe  levels.) 

The  use  of  linear  programming  to  design  antenna  arrays 
is  not  entirely  new.  In  McMahon  et  at.'  and  Wilson,*  linear 
programming  was  used  to  synthesize  desired  complex  trans¬ 
fer  functions  to  within  3  dB  of  the  best  possible  side-lobe 
level.  Their  method  corresponds  identically  to  taking  p  =  2 
in  the  method  presented  in  this  paper,  i.e.,  treating  only  the 
real  and  imaginary  parts  of  Eq.  (2). 

The  computation  of  the  real  weights  of  Fig.  2  (where 
p  =  2,  m  =  251,  and  n  =  44)  and  of  Fig.  4  (where  p  =  8, 
m  =  501,  and  n  =  44)  required  1.2  min/205  simplex  itera¬ 
tions  and  38.4  min/402  simplex  iterations,  respectively.  On 
the  other  hand,  when  the  weights  were  allowed  to  be  com¬ 
plex  (replacing  n  =  44  by  n  =  88,  but  leaving  p  and  m  un¬ 
changed  in  both  cases),  the  computations  required  7.0  min/ 
657  simplex  iterations  and  1 79  min/ 1 262  simplex  iterations, 
respectively.  The  two  of  these  four  cases  requiring  the  small¬ 
est  CPU  times  encountered  almost  no  system  overhead  due 
to  program  size.  However,  the  two  cases  requiring  the  lar¬ 
gest  CPU  times  encountered  very  significant  system  over¬ 
head  because  their  large  memory  requirements  caused  sig¬ 
nificant  usage  of  the  virtual  memory  feature  of  the  DEC 
VAX  1 1/780.  The  38.4-min  case  required  over  3.6  million 
page  faults,  while  the  179-min  case  required  over  1 1  million 
page  faults.  It  is  important  to  bear  in  mind  that  the  DEC 
V AX  1 1  /780  is  essentially  a  minicomputer,  and  that  without 
virtual  memory,  only  the  largest  mainframe  computers 
could  have  solved  either  of  these  two  problems. 

II.  EFFORTS  TO  EXTEND  THE  METHOD 

Our  basic  problem  is  to  minimize  the  maximum  magni¬ 
tude  of  complex  error 

e.fzjo)  = /fz)  -  £  aM?)  (7) 

k.  i 

over  a  continuum  of  values  of  z,  when/;  ( hk  | ,  and  ( ak  |  are 
complex.  We  immediately  approximate  this  desired  problem 
by  discretizing  the  z  variable  to  a  finite  number  of  values,  in 
order  to  make  the  problem  computable.  Furthermore,  at  any 
z  value  of  interest,  we  additionally  discretize  the  number  of 
phase  errors  we  are  willing  to  consider.  To  be  specific,  since 
the  algorithm  in  Barrodale  and  Phillips1'2  applies  only  to  real 
quantities,  we  consider  the  “projection*'  of  a  routed  version 
of  the  complex  error; 

P(z,tfr)  =  Re|exp(iV)e.(z;j)).  (8) 

Then,  since  the  argument  of  complex  error  [Eq.  (7)]  is  un¬ 
known  a  priori,  we  let  tp  take  on  a  finite  set  of  values  spread 
over  any  ir  radian  interval,  and  minimize  the  magnitude  of 
projection  [Eq.  (8)]  over  all  these  selected  t h  values.  This  is 
equivalent  to  the  method  of  the  Appendix. 

In  an  effort  to  eliminate  this  second  discretization  pro¬ 
cess  in  ip,  a  perturbation  method  was  put  forth9  that  claimed 
guaranteed  convergence  to  the  optimum  weights  for  any  giv¬ 
en  finite  discrete  set  of  z  values.  When  applied  to  the  exam¬ 
ples  in  Barrodale  el  al.  9  the  proposed  perturbation  tech¬ 
nique  did  indeed  converge.  However,  when  applied  to  the 

R.  I  Strait  and  A.  H.  Nuttall:  Ct>ebyshev  approximation  184 


-34- 


following  example,  of  approximation  of  exp(i'3x)  by  the  three 
basis  functions  1,  exp(ix),  exp(iZx),  over  100  equispaced 
points  in  the  domain  [0,>r/4]  in  x,  it  sometimes  failed  to  con¬ 
verge,  depending  on  the  initial  weights  employed.  The  rea¬ 
son  for  this  failure  is  that  the  "direction  of  the  minimum” 
furnished  by  the  perturbation  is  often  totally  irrelevant,  and 
the  best  scale  factor  to  apply  to  this  perturbation  is  very 
small.  Thus  there  occurs  a  small  random  meander  in  the 
coefficient  space,  and  occasional  convergence  to  a  nonopti¬ 
mum  point.  A  modification  of  this  technique  was  attempted 
wherein  the  magnitude  of  the  perturbation  was  bounded. 
Although  this  improved  the  situation  somewhat,  conver¬ 
gence  to  the  optimum  was  not  always  obtained. 

It  was  thought  that  this  meander  in  coefficient  space 
might  be  eliminated  by  tracking  the  exact  z  values  at  which 
Eq.  (7)  is  a  maximum.  Recall  that  in  the  real  case  discussed  in 
the  Introduction,  convergence  to  the  absolute  optimum  over 
a  continuum  of  real  z  values  was  achieved  in  a  practical  ex¬ 
ample  by  re-evaluating  the  z  points  of  maximum  error  and 
using  these  in  a  recursive  approach.  When  this  idea  was  ex¬ 
tended  to  the  two  continuous  variables  z,  il>  in  Eq.  (8),  and 
only  the  2n  +  1  largest  error  points  were  retained,  conver¬ 
gence  was  not  obtained.  When,  however,  the  single  “point" 
of  a  maximum,  i.e.,  a  pair  of  values  (zt  ,t l\ ),  was  replaced  by  a 
"patch”,  i.e.,  a  set  of  values  |  (z*,  ,<£*,, )  |  covering  the  maxi¬ 
mum  point  (Zi.tl'tl,  the  convergence  to  the  absolute  opti¬ 
mum  for  the  examples  considered  was  apparently  achieved. 
The  patch  width  in  t f  was  of  the  order  of  a  degree  in  most 
cases.  The  problem  with  this  latter  modification  is  that  a 
large  number  of  computations  of  the  error  function  and  its 
derivative  must  be  evaluated,  and  the  improvement  over  the 
method  of  the  appendix  is  insignificant  when  p  there  is  large. 

If  the  final  tr~  -i;  iq.  (7),  after  application  of  the  meth¬ 
od  of  the  Appendix  is  inadequate  due  to  inadequate  sampling 
in  z  and/or  t i '  is  possible,  for  a  given  coefficient  set  j  ak  j ,  to 
locate  the  point  (zm  ,i/i„  )  at  which  Eq.  (8)  is  largest,  and  then 
use  a  gradient  approach  to  decrease  this  maximum  error  at 
(z„  ,i/>„ ).  Of  course,  the  particular  point  of  maximum  will 
jump  around  as  the  set  |  ak  |  is  perturbed;  nevertheless,  the 
technique  does  converge  (although  slowly)  and  does  lead  to 
smaller  errors  at  the  maximum  of  Eq.  (8)  in  a  continuum  for  z 
and  tl>. 

III.  DISCUSSION  AND  SUMMARY 

It  has  been  observed  that  two  of  the  locations  of  maxi¬ 
mum  magnitude  error  often  occur  at  the  endpoints,  if  the 
specified  domain  in  Eq.  (2)  is  a  real  interval.  (For  example, 
see  Figs.  A 1  and  A2.  The  example  of  real  coefficients  in  Fig. 
A1  had  one  of  the  maximum  error  points  at  one  endpoint, 
but  not  the  other.  However,  if  we  had  specified  domain 
[  —  jr/4,jr/4]  in  that  example,  we  would  have  observed  four 
peak-error  points,  two  of  which  would  have  been  at  end¬ 
points,  due  to  the  conjugate  property  of  the  desired  function 
and  the  basis  functions.)  Since  the  endpoints  may  be  the  only 
ones  we  can  anticipate  a  priori  and  specify  as  locations  of 
maximum  error,  an  obviously  useful  procedure  is  to  use 
more  values  of  phase  shift  i'  in  Eq.  (8)  [alternatively,  the 
angles  1 8 ,  |  in  Lemma  A2]  at  the  endpoints  than  in  the  inte- 

185  J.  Acoust  Soc  Am,  Vol  72,  No  1.  July  1982 


nor,  so  as  to  better  control  these  very  likely  locations  of 
maximum  error.  For  example,  we  might  use  p  =  6  in  the 
interior  of  a  specified  real  interval  domain  of  z  and  use 
p  =  12  or  20  at  the  two  endpoints.  This  does  not  add  greatly 
to  the  total  computation,  since  there  are  generally  far  more 
interior  points  than  (two)  endpoints.  The  program  in  Streit 
and  Nuttall4  may  be  readily  used  with  different  values  ofp  at 
different  data  points. 

The  p  different  phase  shifts  0  selected  in  Eq.  (8)  have 
been  chosen  here  to  be  equally  spaced  over  a  180*  span  (along 
with  their  1 80*  mates).  This  is  the  most  reasonable  selection 
in  the  absence  of  a  priori  knowledge  of  the  complex  error 
magnitude  and  phase  because  it  gives  the  best  upper  bound 
in  Lemma  A2  of  any  set  of  phases.  However,  one  could  select 
any  value  of  il>  to  investigate  the  error;  for  example,  different 
sets  of  values  of  tl>  could  be  used  at  various  values  of  abscissa 
z.  The  program  in  Streit  and  Nuttall4  may  be  used  with  any 
desired  set  of  phases  at  any,  or  all,  of  the  data  points. 

The  potential  for  significant  round-off  error  accumula¬ 
tion  is  always  present  in  linear  Chebyshev  complex  function 
approximation.  For  example,  in  approximating 
f{x)  =  cos|12x)  +  j  sin|3x)  by  a  complex  linear  combination 
of the  12  basis  functions  l,exp|ix),...,exp(/l  lx)  on  the  interval 
[0,ir/4],  the  complex  coefficients  of  best  approximation  were 
observed  to  be  large  in  magnitude  and  to  lie  in  all  quadrants 
of  the  complex  plane;  therefore  significant  numerical  round¬ 
off  error  occurred  during  computation  of  the  residuals  with¬ 
in  algorithm  ACM495.  Even  if  the  coefficients  of  best  ap¬ 
proximation  had  happened  to  be  better  behaved,  serious 
cancellation  error  may  still  occur  in  some  problems  because 
of  the  very  nature  of  complex  arithmetic.  It  might,  therefore, 
be  wise  to  use  a  double  precision  version  of  algorithm 
ACM495  routinely  in  complex  Chebyshev  approximation 
problems  to  alleviate  such  cancellation  errors. 

A  sensitivity  analysis  on  the  optimum  coefficients  may 
be  in  order  in  some  applications  to  determine  their  utility. 
This  consideration  is  completely  independent  of  their  nu¬ 
merical  accuracy.  For  example,  in  an  antenna  array  design 
problem  where  some  elements  are  spaced  significantly  less 
than  a  half-wavelength  apart,  it  might  well  turn  out  that  the 
optimum  coefficients  need  to  be  specified  with  a  relative  er¬ 
ror  of  better  than  10-6.  Then,  although  the  mathematical 
results  may  be  correct  and  accurate,  practical  usage  is  pre¬ 
cluded.  This  sensitivity  can  be  determined  by  perturbing  the 
optimum  weights  a  few  percent  and  observing  if  a  drastic 
change  occurs  on  the  desired  side-lobe  behavior.  (Such  ar¬ 
rays  are  referred  to  as  super-directive  arrays.) 

APPENDIX;  MATHEMATICAL  THEORY  AND 
ALGORITHM 

Let /and  h„...,h„  be  complex  valued  functions  defined 
on  the  finite  discrete  point  set  Qm  =  \z„...jm  |.  For  a  com¬ 
plex  vector  a  =  (a  ,,...,a„  )eC ,  define  the  complex  error 

/(*)-  X  (*)=*„ l*o).  zzQn,  (Al) 

A  =  1 

The  discrete  linear  Chebyshev  approximation  problem  is  to 
find  a  complex  vector  a  =  | a . .  leC”  so  that 

R.  L.  Streit  and  A.  H.  Nuttall:  Chebyshev  approximation  185 


-35- 


£,</)=*  min  max|e„(z;fl)|  =  max|e„|z;a)|.  (A2| 

mC‘  "C. 

The  quantity  E.  (/)  is  called  the  discrete  Chebyshev ,  or  mini¬ 
max,  error  of  the  approximation  on  the  point  set  Q„ .  (The 
restriction  of  a  to  real  values  is  discussed  below.) 

We  do  not  solve  this  problem  exactly.  An  algorithm 
presented  in  Barrodale  et  al*  for  its  solution  is  erroneous;  we 
have  discovered  examples  (see  Sec.  II)  such  that  the  recursive 
procedure  described  there  need  not  converge  to  a  solution  of 
Eq.  ( A2).  We  will  show  that  problem  ( A2)  can  be  replaced  by 
a  related  approximate  problem  solvable  by  available  linear 
programming  techniques.  The  exact  solution  of  this  related 
problem  yields  approximate  solutions  of  Eq.  (A2).  The  error 
in  these  approximate  solutions  to  Eq.  (A2)  can  be  determined 
and,  in  fact,  made  arbitrarily  small,  using  the  results  we 
prove  below;  see  Theorems  A1  and  A2. 

It  can  be  shown  by  standard  mathematical  methods10 
that  a  vector  a  satisfying  Eq.  ( A2)  exists,  although  it  may  not 
be  unique.  Sufficient  conditions  are  known  that  result  in 
unique  a,  but  we  do  not  need  these  conditions  here.  There¬ 
fore  no  further  assumptions  on/  A A,  or  the  point  set  Qm 

are  made.  In  order  to  proceed,  we  need  the  following  results. 
Proofs  of  all  these  results  are  given  in  Streit  and  Nuttall.'* 
Lemma  At.  If  z  =  x  +  iy,  where  jc  and  y  are  real,  then 

|z|  =  max  (.r  cos  0  +ysin0). 

- *< Os* 

Lemma  A2.  Let  0,  =  ir(j  —  l )/  p,j  =  1,2 . 2  p.  where 

the  integer p> 2.  Let  z  =  x  +  i y,  and  let 

M  —  max  (xcos0,  +  vsin0,). 

J-l . V 

Then 

M<,\z\<M  sec[tr/(2p)]. 

We  are  now  in  a  position  to  describe  a  problem  that  we 
can  solve  exactly  and  that  is  related  to  the  given  discrete 
linear  Chebyshev  approximation  problem  ( A2).  Let  the  real 
and  imaginary  parts  of  the  complex  error  e,  (z;a)  be  denoted 
by  ft„(z;a)  and  /.  (z;a),  respectively.  For  notational  conve¬ 
nience,  we  define,  for  any  complex  vector  aeC , 

G,(z;a}  -  R,(z.a]cos  0,  +  /„(z;a)sin  0y,  j  =  l,...,2p,  (A3) 

where  0 ,0  ^  are  the  angles  given  explicitly  in  Lemma  A2. 

We  seek  a  complex  vector  a  =  (A,,.. .Jan  )€C  satisfying 

Af„(/)~min  max  max  G,(z;a), 

-c»*e. /-• . ip 

=■  max  max  GAz;a).  (A4) 

/- 1 . tp 

With  standard  mathematical  methods,  it  is  easy  to  see  that  at 
least  one  such  vector  aeC  exists.  The  connection  between 
the  problem  (A4)  and  the  problem  (A2)  is  explored  in  the 
next  few  results. 

Theorem  Al.  Let  p>2  be  an  integer,  and  let 

0 j  =  ir\j  —  1 )/  p,  j  =  1,2 . 2  p.  Then 

K„(/)<£.(/)<Af.,(/tsecl7r/(2p)]. 

Theorem  A2 .  Let  p>2  be  an  integer,  and  let 

0,  =  *\i  -  l)/p.  j  =  1.2,. ..,2 p.  Let 

186  J  Aeoust  Soe.  Am.,  Vol  72.  No  1.  July  1982 


*’„,(/)  =  max|e„(z;o)|, 

where  the  complex  vector  aeC  is  any  vector  satisfying  ( A4). 
Then 

£.(/)<  (/»<£.  </)sec  ( 17/(2  p)] . 

Corollary  A2.I.  Under  the  conditions  of  Theorem  A2, 

w„,(/)<£.(/)<  r„,(/). 

The  preceding  corollary  evidently  gives  excellent  upper 
and  lower  bounds  on  the  discrete  linear  Chebyshev  approxi¬ 
mation  error  £„(/),  and  these  bounds  are  readily  available 
after  the  numerical  computation  of  aeC  and  has 

been  completed.  We  point  out  that  the  above  two  theorems 
substantially  generalize  results  in  Barrodale  et  ai,'>  p.  854. 

Using  the  Maclaurin  series  for  sec  x  in  Theorem  A2 
gives  the  relative  discrepancy 

«?.,(/)-£.(/)  _  ,  „„  „  . 

°<  — ~  —  - <sec[ir/(2p)J  -  I 

^m\J) 

=  W+0(?)' 

Note  that  this  upper  bound  on  the  relative  error  is  indepen¬ 
dent  of/  the  point  set  Q„ ,  the  basis  functions  j  A*  | ,  and  n. 

We  will  now  explicitly  formulate  an  overdetermined 
system  of  real  linear  equations  to  be  solved  in  the  Chebyshev 
norm  (to  be  defined)  which  is  equivalent  to  solving  the  prob¬ 
lem  (A4).  Referring  to  the  choice  of  0/s  in  Lemma  A2,  we 

observe  that  0p  +  t  =  it  +  0  r  j=  1 . p,  and  so,  from  Eq. 

(A3),  we  have 

G ,  „  y(z;fl)  =  -  G j(z;a),  j  =  1 . p. 

Therefore,  we  may  rewrite  Eq.  (A4)  as 

M.p{f)-  min  max  |G,|z,;a)|.  (A5) 

*C‘ 

Now,  breaking  the  following  quantities  into  their  real  and 
imaginary  components 

/(z)  =  «(z)  +  «4z), 

A*W®  **(*)  +  *  =  1 . 

a»=A»+ic»,  A  =  1 . n, 

we  may  write 

A„(z;a)  =  u|z)  -  X  V»W  +  X 

ft  -  I  A  -  1 

/„(z;fl)  =  u(z)-  J  MaW-  X  f*r*W> 

A  -  1  k  -  I 

Gt{z,;a)  =  i/(z,)  cos  0,  +  t>(z,)  sin  0, 

-  X  b*  +  f.fz()sin0,J 

A  *=  I 

-  X  ct(r*  <*.•*•"£/ -*»(*.(«*  0,]. 

A  =  I 

Note  that  G/z,  ;o)  is  a  real  linear  equation  in  the  2 n  variables 
| bk  |  and  )<"*},  and  that  all  the  coefficients  of  this  equation 
are  computable  directly  from  known  data. 

Define  the  mpx2n  real  matrix  B  in  the  partitioned 

form 

R  L.  Streit  and  A  H  Nuttall:  Cltebyaltev  appxqximatton  166 


-36- 


FIG.  A 1 .  Error  curves  for  real  coefficients;  m  —  II. 


with  the  m  X  n  submatrices 
B,  =  R  cos  6,  +  S  sin  6., 

t  =  p, 

D,  =  R  sin  9,  —  Scos  6,. 
where  R  and  5  are  real  mxn  matrices  defined  by 
*=[M*,)].  S=[rt(z,)]. 

Also,  define  the  real  vector 

1 1  »*•■»&  I ,»  f-tg  pm  ] 

of  length  m  p,  where 

g„  =  t/(z,)cosfl,  +  ii(z,)sin0,,  /  =  1 . m,  j=  1 . p 

Finally,  define  the  real  vector 
*  =  l*i . 

of  length  2 n.  With  this  notation  in  hand,  it  is  easily  seen  that 
the  overdetermined  system  of  m  p  equations  in  In  unknowns 

Bx  =  g  (A6) 

has  a  residual  error  vector,  defined  by 
g-Bx, 

whose  m  p  components  are  precisely  the  m  p  real  numbers 


FIG  A2.  Error  curves  for  complex  coefficients,  m  =  1 1 


187  J.  Acoust.  Soc.  Am.,  Vol  72,  No.  1.  July  1982 


Gj{z, ;a)  arranged  in  a  special  order.  Therefore  the  problem 
(A5)  can  be  solved  by  computing  a  solution  to  the  overdeter¬ 
mined  linear  system  (A6)  in  the  Chebyshev  norm;  i.e.,  the 
largest  magnitude  component  of  the  residual  vector  g  —  Bx 
is  minimized  over  all  choices  of  the  vector  x. 

This  equivalent  problem  in  linear  algebra  can,  in  princi¬ 
ple,  be  solved  exactly  and  in  a  finite  number  of  steps  using 
linear  programming  methods. 1,2  Solutions  of  Eq.  (A6)  are 
not  required  to  be  unique;  every  solution  of  Eq.  (A6)  is  a 
solution  of  Eq.  (AS). 

An  excellent  algorithm,  which  we  will  refer  to  as  ACM 
49S,  is  available  in  the  literature1’2  for  solving  the  overdeter¬ 
mined  system  of  equations  Ax  =  b.  A  linear  program  is  set 
up  and  solved  by  the  algorithm,  so  that  knowledge  of  linear 
programming  techniques  is  not  necessary  to  use  the  algo¬ 
rithm  in  practice.  The  computational  procedure,  internal  to 
the  algorithm,  actually  solves  the  dual  of  the  primal  linear 
program  using  a  modification  of  the  simplex  method.  The 
dual  formulation  of  this  problem  is  available.2"  We  will  not 
discuss  the  details  of  the  linear  programming  technique  in 
this  paper. 

A  very  simple  modification9  of  ACM  495  yields  an  al¬ 
gorithm  for  solving  any  real  overdetermined  system  of  linear 
equations  in  the  Chebyshev  norm  subject  to  the  additional 
constraints  that  all  the  residuals  be  non-negative.  For  a  gen¬ 
eral  system  Ax  =  b,  this  problem  takes  the  form 

minimize  max  [bJ  —  V  alk xk  J, 

X,.  .X,  \  *  _  |  / 

subject  to  the  r  constraints 

S 

bj  -  Y.  )=  1 . r- 

»  -  I 

The  solution  x,,...pc,  returned  by  this  modified  algorithm  is 
correct,  even  though  the  residuals  returned  may  be  in  error. 
The  correct  residuals,  if  desired,  must  be  calculated  directly 
from  the  solution.  Alternatively,  if  the  residuals  are  required 
to  be  non-positive,  then  the  same  modified  algorithm  will 
work  with  A  and  b  replaced  by  —  A  and  —  b,  respectively. 

Requiring  non-negative  residuals  in  the  overdeter¬ 
mined  system  (A6)  has  interesting  geometrical  interpreta¬ 
tions.  For  example,  if  we  take  p  =  2  in  Lemma  A2,  then 
9 1  =  0  and  92  =  ir/2.  Thus  G,{z;a)  and  G2{z;a)  are  merely  the 
real  and  imaginary  parts  of  the  complex  error  e „  (z;a),  and  the 
2m  components  of  the  residual  vector  g  =  Bx  are  precisely 
the  real  and  imaginary  parts  of  en  (z;a)  evaluated  in  all  m  data 
points.  Therefore,  if  the  system  (A6)  is  required  to  have  non¬ 
negative  residuals,  we  have  forced  the  error  curve  to  lie  en¬ 
tirely  in  the  first  quadrant  of  the  complex  plane.  More  gener¬ 
ally,  we  may  always  constrain  em  (z;a)  to  lie  in  a  given  convex 
wedge-shaped  sector  of  the  complex  plane  with  vertex  at  the 
origin,  by  making  different,  but  appropriate,  choices  of  the 
angles  9,  and  92. 

Suppose,  finally,  that  the  complex  solution  vector  aeC 
of  problem  (A4)  is  required  to  be  strictly  real,  while  /  and 
{ hk  |  are  complex.  Then,  in  the  vector  x  of  Eq.  (A6), 
c,  =  ...  =  c„  =0.  Thus  the  overdetermined  system  Bx—g 
of  mp  equations  in  In  unknowns  can  be  replaced  by  a 
smaller  system  Bx  =  g  of  mp  equations  in  only  n  unknowns, 

R.  L.  Strert  and  A.  H.  Nuttair  Chebyshev  approximation  187 


-37- 


TABLE  Al.  Coefficients  for  the  real  weight  case12 


m 

p 

a, 

a: 

a, 

il 

2 

0936738 

-  2.443144 

2.518388 

6 

0.828404 

-  2.280319 

2  396455 

18 

0.858547 

-  2.321885 

2425096 

54 

0.844146 

-  2.301461 

2  410611 

101 

2 

0.936781 

-  2  443223 

2.518458 

6 

0.831314 

-  2.284548 

2.399525 

18 

0  865131 

-2.331446 

2.432033 

54 

0.853823 

-  2.315301 

2  420506 

1001 

2 

0936785 

-  2  443232 

2.518466 

6 

0.831237 

-  2.284448 

2.39946! 

18 

0865213 

-  2.331571 

2  432127 

54 

0.853443 

-  2  314772 

2.420138 

where  the  m  pxn  real  matrix  B  is  defined  in  partitioned 
form  by 


where  the  mXn  submatrices  ,  are  unchanged  from 

(A6),  and  the  real  vector  x  =  [A, . b„]T .  A  solution  of 

Bx  —  gin  the  Chebyshev  norm  can  be  computed  using  linear 
programming  and  algorithm  ACM  495  as  before. 

We  illustrate  the  procedure  by  approximating  the  com¬ 
plex  function/U)  =  exp(i3x|  by  a  weighted  sum  of  the  basis 
functions  1,  exp(ix),exp(/2x).  That  is,  we  seek  to  minimize  the 
magnitude  of  the  complex  error  curve 

e,|x)=exp|i3x)  —  V  exp(i(k  -  l|x]  (A8| 

l  “  1 

over  interval  [0,7t/4],  by  choice  of  a„a2,a},  by  solving  the 
problem  Af  „  p  ( /)  of  Eq.  ( A4).  Two  cases  are  of  interest;  in  the 
first,  the  coefficients  |at  |j  are  restricted  to  be  real,  whereas 
in  the  second,  these  coefficients  can  be  complex.  The  number 
m,  of  equispaced  x  values  at  which  Eq.  (A8)  is  sampled,  is 
taken  to  be  either  11 , 101 ,  or  1001 ,  thereby  ensuring  that  the 
smaller  sample  sizes  are  subsets  of  the  larger  sizes.  The  value 
of  p,  which  is  half  the  number  of  phase-shifted  values  of  Eq. 
(A8)  employed  in  the  error  minimization,  is  taken  to  be  2, 6, 
18,  54,  again  ensuring  the  subset  behavior  of  the  smaller  size 


cases.  Note  that  p  and  the  phase  shifts  j  Q  t  j  are  as  given  in 
Theorem  Al. 

The  optimum  real  coefficients  in  Eq.  |A8)  for  the  prob¬ 
lem  Af,,!/)  are  given  in  Table  Al  for  these  choices  of  m  and 
p,  and  a  plot  of  the  magnitude  of  the  error  for  several  repre¬ 
sentative  cases  is  given  in  Fig.  A 1 .  The  est  approximation  of 
all  cases  considered  is  afforded  by  m  =  1001,  p  =  54,  and  its 
error  curve  is  plotted  as  a  solid  line;  its  maximum  error  is 
0. 1078,  which  is  realized  at  two  points  in  the  i  terval  [0,rr/4]. 
The  cases  for  smaller  m  (less  sampling  of  the.  abscissa)  and 
smaller  p  (less  sampling  of  the  phase  of  the  complex  error)  are 
poorer,  for  example,  the  maximum  error  form  =  11, p  =  2  is 
0.1 184,  realized  at  only  one  point,  namely  x  =  rr/4. 

We  have  not  plotted  the  other  error  curves  with  real 
coefficients  for  m  =  101  and  1001,  because  they  are  indistin¬ 
guishable  from  Fig.  A  I,  as  a  perusal  of  Table  Al  shows.  For 
example,  the  coefficients  for  m  =  1 1 ,  p  =  2  are  very  close  to 
those  for  m  -  101,  p  =  2  and  m  =  1001,  p  =  2.  Thus  our 
sampling  in  x  is  already  “fine  enough"  at  m  =  11.  However, 
there  is  a  significant  change  in  the  coefficients  as  p  is  varied, 
for  a  fixed  value  of  m;  that  is,p  =  2  yields  very  coarse  phase 
sampling  of  the  error  curve  and  should  definitely  be  made 
larger. 

The  Chebyshev  error  curve  (m  =  1001,  p  =  541  in  Fig. 
A I  realizes  its  maximum  value  at  only  n  —  1  points,  rather 
than  at  n  +  1  points,  where  n  =  3  is  the  number  of  coeffi¬ 
cients  for  this  example.  This  ts  probably  related  to  the  fact 
that  we  have  minimized  both  the  real  and  imaginary  parts  of 
the  complex  error,  but  have  allowed  ourselves  to  use  only 
real  coefficients. 

The  solution  of  the  problem  for  complex 

weights  is  given  in  T able  All  for  the  same  choices  of  m  and  p 
as  above.  Again,  the  change  in  coefficient  values  is  more 
marked  with  p  than  with  m.  Magnitude-error  curves  for 
m  =  1 1  and  101  are  given  in  Figs.  A2  and  A3,  respectively; 
the  curves  form  =  1001  are  indistinguishable  from  those  for 
m  =  101  and  are  not  presented. 

The  Chebyshev  error  curve  (m  =  1001,  p  =  54)  is  now 
symmetric  about  the  midpoint  of  the  interval  of  interest  and 
has  four  equal  error  peaks  of  value  0.0147.  This  error  is  7.3 
times  smaller  than  that  for  the  real  coefficient  case.  Also,  the 
number  of  equal  error  peaks  now  equals  1  plus  the  number  of 
coefficients;  whether  this  property  holds  generally  is  not 
known. 


TABLE  All.  Coefficients  for  the  complex  weight  case. 


m 

P 

Rete.l 

Imla,) 

Rqa,) 

Im|a,) 

11 

2 

0.364737 

0.954343 

-2.021670 

-  2.119639 

2.669023 

1.153207 

6 

0.378043 

0.907888 

-  2.016657 

-2.018598 

2.648834 

1.100488 

18 

0.373079 

0.898715 

-  2.003032 

-  2.003205 

2.639992 

1.094451 

54 

0.371586 

0.896504 

-  1.999352 

-  1.999473 

2.637788 

1.092947 

101 

2 

0.362962 

0.953469 

-  2.018255 

-2.119960 

2  667544 

1.154238 

6 

0.376532 

0  904026 

—  2.012095 

-  2.014055 

2.646131 

1.099461 

18 

0.370549 

0.893500 

-  1.995913 

-  1.997062 

2.635782 

1.093144 

54 

0.368950 

0.890017 

-  1.991172 

-  1.991196 

2.632622 

1.090777 

1001 

2 

0.362947 

0.953499 

-  2.018253 

-2.120028 

2.667560 

1.154275 

6 

0.376502 

0.903926 

-2.011979 

-2.013914 

2.646047 

1.099417 

18 

0.370711 

0.893848 

-  1.996440 

-  1.997545 

2.636145 

1.093278 

54 

0.369179 

0.890566 

-  1.991954 

-  1.991974 

2.633175 

1.09 1006 

108  J.  Acouat.  Soc.  Am..  Vol.  72.  No.  (.July  1982 


R  L.  Streit  and  A.  H.  Nuttall:  Chebyshev  approximation  188 


-38- 


fsm 

| 

\  / 

- 

\  \ 

/  / 

/  / 

y  > 

V  tl  1 m  ■ 

U  J  1 "  • 

01. p«  IS  |  \  ’ 

001,  p*  Ml  \  \ 
\\ 

/  / 

/  / 

/  / 

v'  / 

\ 

\\ 

\v 

/  / 

/  / 

\T 

\J 

m  m  101.  p  ■  S 

‘  / 

'/ 

101. p  •  IS  — 

i  t 


FIG.  A3.  Error  curves  tor  complex  coefficients;  m  =  101. 


TABLE  AIV.  Maximum  magnitude  error,  computed  over  2001  equispaced 
points  in  (0.rr/4j. 


m 

P 

Real  coefficients 

Complex 

coefficients 

11 

2 

0.118396 

0.017097 

6 

0.108780 

0.015142 

18 

0.107890 

0.015004 

54 

0.107983 

0.015005 

101 

2 

0.118415 

0.017329 

6 

0108893 

0.014946 

18 

0.107967 

0.014733 

54 

0.107813 

0.01471 1 

1001 

2 

0118417 

0.017331 

6 

0108902 

00I49S0 

18 

0.107976 

0.014735 

54 

0.107821 

0.014712 

Upper  and  lower  bounds  on  the  discrete  Chebyshev  er¬ 
ror  £„  (/)  for  the  real  and  complex  coefficient  cases  are  given 
in  Table  AIII.  These  bounds  are  precisely  those  presented  in 
Corollary  A2.1.  They  correspond  to  sampling  the  complex 
error  ( A8)  both  in  the  abscissa  x  and  in  the  phase  of  e3(jt ).  The 
lower  bounds  monotonically  increase  with  increasing  morp. 
The  upper  bounds  decrease  with  increasing  p,  but  increase 
with  increasing  m.  All  these  trends  follow  from  the  fact  that 
smaller  sample  sizes  are  subsets  of  the  larger  sizes. 

However,  the  maximum  magnitude  error,  evaluated 
over  the  continuum  of  x  values  in  the  interval  [0,ir/4]  (actu¬ 
ally  computed  on  a  dense  discrete  sampling  space),  obeys 
none  of  the  these  monotonic  relations,  as  Table  AIV  demon¬ 
strates.  For  example,  the  maximum  error  in  the  real  case  for 
m  =  1 1,  p  =  18  is  less  than  that  for  m  =  1 1,  p  =  54.  Also, 
the  maximum  error  in  the  complex  case  for  m  =  1  \,p  m  6  is 
greater  than  that  for  m  =  101,  p-6.  The  reason  for  this 
behavior  is  that  we  have  minimized  a  discrete  approximation 
to  our  problem  of  interest,  sampling  both  in  the  absciss*  x 
and  in  the  phase  values  of  the  complex  error.  However,  the 
numerical  discrepancies  are  small,  as  they  must  be  for  rea¬ 
sonably  fine  sampling  in  both  variables.  (A  recursive  gradi¬ 
ent  procedure  could  be  used  with  any  of  these  coefficient  sets 
to  improve  the  final  maximum  magnitude  error  if  desired.) 


Efficiency  and  timing  estimates  for  actual  calculation  of 
complex  Chebyshev  approximations  by  the  method  of  this 
paper  is  an  important  consideration  in  some  applications.  If 
we  define  an  operation  as  consisting  of  a  multiplication  fol¬ 
lowed  by  an  addition,  then  it  is  known13  that  the  number  of 
operations  per  simplex  iteration  required  by  algorithm 
ACM  495  is  exactly  the  number  of  equations  times  the  num¬ 
ber  of  unknowns.  In  our  case,  the  number  of  equations  is 
m  p,  and  the  number  of  unknowns  is  2 n  if  the  coefficients  are 
complex,  or  n  if  the  coefficients  are  required  to  be  real.  Thus 
the  operation  count  per  iteration  is  either  Inmp  or  nmp.  The 
number  of  iterations  required  is  difficult  to  estimate,  since  it 
depends  on  the  particular  problem.  However,  in  randomly 
generated  problems,  it  has  been  observed13  that  the  number 
of  iterations,  /,  is  approximately  the  number  of  unknowns 
times  some  small  constant  c,  where  usually  1  <c<  3.  (Similar 
estimates  have  been  observed14'’5  in  more  general  linear  pro¬ 
grams  as  well.)  Thus,  in  our  case,  /  =  2cn  if  the  coefficients 
are  complex  and  I  =  cn  if  they  are  real. 

The  CPU  time  should  be  proportional  to  the  total  oper¬ 
ation  count,  which  equals  the  product  of  the  number  of  itera¬ 
tions  and  the  number  of  operations  per  iteration.  That  is,  we 
expect  the  CPU  time  to  be  proportional  to  n2m  p.  For  the 
particular  example  here,  however,  we  obtain  an  excellent  fit 
to  the  limited  data  in  Table  AV  with  the  equation 


TABLE  AIII.  Bounds  on  the  discrete  Chebyshev  error  £,(/). 


m 

P 

Lower  bound 

Real  coefficients 

Upper  bound 

Complex  coefficients 

Lower  bound  Upper  bound 

11 

2 

0.083718 

0.118396 

0.012089 

0.017097 

6 

0.105074 

0.108780 

0.013963 

0014456 

18 

0.107307 

0.107717 

0.014143 

0.014197 

54 

0.107612 

0.107658 

0.014168 

0.014174 

101 

2 

0.083731 

0.118414 

0.012252 

0.017328 

6 

0.105192 

0.108893 

0.014436 

0.014946 

18 

0.107556 

0.107967 

0.014677 

0.014733 

54 

0.107767 

0.107813 

0.014703 

0.014709 

1001 

2 

0.083734 

0.113418 

0.012255 

0.017331 

6 

0 105191 

0.108901 

0.014440 

0.014950 

18 

0.107565 

0.107976 

0.014679 

0.014735 

54 

0.107775 

0.107821 

0014704 

0.014712 

109  J.  Acoust.  Soc  Am.,  Vol.  72.  No.  1.  July  1902 


R.  L.  Streit  and  A.  H.  Nuttall:  Chebyshev  approximation  109 


-39- 


TABLE  AV.  Number  of  simplex  iterations  *nd  CPU  time. 


fft 

p 

Real  coefficient* 
Simplex  CPU(s) 

Complex  coefficients 
Simplex  CPU(s) 

II 

2 

6 

0.02 

10 

0.05 

6 

8 

0.08 

15 

0.16 

18 

11 

0.23 

21 

0.S8 

54 

13 

0.81 

27 

2.25 

101 

2 

7 

0.25 

10 

0.40 

6 

9 

0.73 

17 

1.60 

18 

13 

2.65 

21 

5.78 

54 

IS 

11.39 

28 

24.27 

1001 

2 

9 

3.05 

13 

5.00 

6 

10 

10.34 

17 

19.38 

18 

13 

48.16 

24 

105.47 

54 

16 

170.52 

28 

359.20 

CPU  time(ms)  =  0.128  n"3m"V 
where  n  =  6  if  the  coefficients  are  complex,  andn  =  3  if  they 
are  real.  This  fit  was  obtained  by  letting  the  exponents  of  n, 
m,  and p  vary  separately.  Other  examples,  however,  lead  us 
to  anticipate  that,  more  generally, 

CPU  timeoc/i2(mp)‘ J, 

with  a  proportionality  factor  of  the  order  of  0.01-0.03  ms, 
where  n  is  either  twice  the  number  of  approximation  coeffi¬ 
cients  if  the  coefficients  are  complex,  or  exactly  the  number 
of  coefficients  if  they  are  required  to  be  real. 

The  CPU  time  estimates  apply,  of  course,  only  to  the 
DEC  VAX  1 1/780 computer  on  which  the  calculations  were 
performed.  The  virtual  memory  feature  of  this  system  allows 
very  large  problems  to  be  solved;  however,  for  sufficiently 
large  problems,  the  system  overhead  incurred  (page  faulting, 
and  so  on)  may  significantly  and  adversely  affect  these  esti¬ 
mates. 

One  method  of  detecting  the  presence  of  significant 
round-off  errors  is  supplied  by  the  nature  of  the  approxima¬ 
tion  problem  itself.  That  is,  it  can  be  proven  that 

Once  and  the  coefficients  have  been  computed  in 

algorithm  ACM  495,  these  bounds  may  be  checked  to  see  if 


190  J.  Acoust.  Soe.  Am.,  Vot.  72,  No.  1,  July  1962 


significant  numerical  round-off  error  has  occurred.  In  exam¬ 
ple  (A8)  above  (p  =  6,  m  =  101,  complex  coefficients),  these 
inequalities  were  observed  numerically  to  hold  to  five  (but 
not  six)  significant  digits.  We  conclude  that  the  effects  of 
round-off  errors,  although  visible  in  the  results,  are  not  sig¬ 
nificant  in  this  example.  (Single  precision  numbers  on  the 
DEC  V  AX  11/780  have  approximately  seven  significant  de¬ 
cimal  digits.) 


'I.  Barrodak  and  C.  Phillips,  "Solution  of  an  Overdetermined  System  of 
Linear  Equations  in  the  Chebyshev  Norm,”  Algorithm  495,  ACM  Trans 
Math.  Software  1.  264-270(1975). 

*1.  Barrodak  and  C  Phillips,  “An  Improved  Algorithm  for  Discrete  Che¬ 
byshev  Linear  Approximation,"  Proceedings  of  the  Fourth  Manitoba  Con¬ 
firmer  on  Numerical  Mathtmatiex,  edited  by  B.  L.  Hartnell  and  H.  C 
Williams  (Utilitas  Math.,  1975). 

’A.  H.  Nuttall.  "Some  Windows  with  Very  Good  Siddobe  Behavior,” 
IEEE  Trans.  Acoust.  Speech  Signal  Process  ASSP-29(!)(196i|;(alsoin 
NUSC  Tech.  Rep.  6239  (9  April  1980)). 

*R.  L.  Streit  and  A.  H.  Nuttall.  “Lincar.Chebyshev  Complex  Function 
Approximation,"  NUSC  Tech.  Rep.  6403.  Nav.  Underwater  Syst  Cent., 
New  London.  CT  (26  February  1961). 

*For  an  \-element  army  and  —  r  dB  peak  side  lobes,  we  have  u0  =  (2/ 
rrlarccosll/qj  where  2q,=  |r  +  (r-  -  If")*'*  +  (r- (r»- H"1T1', 
/• >  10"“.andJlf»A'-  I. 

*J.  T.  Lewis  and  R.  L.  Streit,  "Real  Excitation  Coefficients  Suffice  for  Side- 
lobe  Control  in  a  Linear  Array,"  IEEE  Trans  Antennas  Propag.  (to  ap¬ 
pear);  (also  in  NUSC  Tech.  Memo  No.  >111(4  |I7  August,  1981)]. 

’G  .  W  McMahon.  B.  Hubley.  and  A.  Mohammed.  "Design  of  Optimum 
Directional  Arrays  Using  Linear  Programming  Techniques,"  I-  Acoust. 
Soe.  Am.  51,  304-309(1972). 

■G.  L.  Wilson,  "Computer  Optimisation  of  Transducer-Array  Patterns,” 
J.  Acoust.  Soc.  Am.  59, 195-203  (1976). 

*1.  Barrodak.  L.  M.  Delves,  and  J.  C.  Mason,  "Linear  Chebyshev  Approxi¬ 
mation  of  Complex-Valued  Functions,"  Math.  Computation  32, 853-863 
(1978). 

*90.  Mona/rfus.  Approximation  of  Function s:  Theory  and  Numerical  Meth¬ 
ods  (Springer-Verlag,  New  York.  1967),  p.  1. 

"I.  Barrodak  and  A.  Young.  “Algorithms  for  Best  L,  and  £.  Linear  Ap¬ 
proximations  on  a  Discrete  Set"  Num.  Math.  S,  295-306 11966). 
uIn  this  case,  we  observe  without  proof  that  a,  +  t/I»i  +  o,»0. 

"1.  Barrodak,  private  communication  (18  December  1980). 

'*G-  B.  Dantzig.  Linear  Programming  and  Extensions  (Princeton  U.  P-. 
Princeton,  NJ,  1963),  p.  160. 

,5E.  H.  McCall.  "Performance  Results  of  the  Simpkx  Algorithm  for  a  Set  of 
Real-World  Linear  Programming  Models,"  Tech.  Rep.  80-4,  Comput. 
Set  Dep.,  Univ.  Minnesota,  Minneapolis,  MN  (January  I960). 


R  L.  Strati  and  A.  H.  Nuttall:  Chebyshev  approximation  190 


-40- 


Optimization  Of  Discrete  Arrays 
Of  Arbitrary  Geometry 

R.  L.  Streit 


-41- 


Optimization  of  discrete  arrays  of  arbitrary  geometry 

Roy  L.  Streit 

New  London  Laboratory.  Naval  Underwater  Systems  Center.  New  London.  Connecticut  06320 
(Received  20  November  1979;  accepted  for  publication  4  October  1 980} 

The  concept  of  Directivity  Index  with  Beamwidth  Control  (D1BC)  leads  to  a  practical  method  for  the 
optimization  of  element  excitations  to  control  the  tradeoff  between  beamwidth  and  sidelobe  level  in  a  discrete 
array  of  arbitrary  configuration.  This  optimization  procedure  depends  on  the  design  frequency,  specified 
elemeni  positions,  individual  element  field  patterns,  and  ambient  noise  field  Each  of  these  factors  can  be 
specified  in  a  completely  general  manner.  In  addition,  the  optimization  procedure  can  be  adapted  to 
computers  of  modest  memory  size  by  using  subarrays  of  the  full  array.  Examples  are  included  to  show  the 
versatility  of  this  approach  to  the  optimization  problem,  as  well  as  its  limitations  One  of  these  examples  is  a 
105-element  cylindrical  array. 

PACS  numbers:  43.60.Gk,  43.30. Vh,  43.28.Tc 


I.  THE  CONCEPT 
A.  Introduction 

Optimization  of  the  element  excitations  of  discrete 
antenna  arrays  is  a  matter  of  definition  for  three  rea¬ 
sons.  First,  the  definition  of  optimality  will  dictate 
the  appropriate  mathematical  approach.  Seemingly 
subtle  changes  in  the  definition  of  optimality  can  alter 
radically  the  applicable  mathematical  methods.  Second, 
element  excitations  that  are  optimal  in  one  sense  are 
unlikely  to  be  optimal  in  another  sense.  Two  sets  of 
excitations,  each  set  optimal  in  its  own  sense,  can  be 
completely  different.  Third,  the  definition  of  optimal¬ 
ity  must  reflect  directly  on  the  primary  design  goals 
for  the  array.  It  is  pointless  to  optimize  the  Directi¬ 
vity  index  (DI)  and  then  complain  that  the  sidelobes  are 
too  high,  because  the  design  goal  of  low  sidelobes  and 
the  definition  of  optimality  (maximum  DI)  are  not  di¬ 
rectly  related. 

This  article  defines  and  uses  exclusively  the  con¬ 
cept  of  Directivity  Index  with  Beamwidth  Control 
(DIBC).  Several  advantages,  as  well  as  difficulties , 
inherent  in  this  definition  are  discussed.  The  primary 
difficulty  in  this  definition  is  the  requirement  of  large 
computer  memories  for  large  arrays.  A  technique 
employing  subarrays  of  the  full  array  in  a  systematic 
manner  is  shown  to  overcome  this  problem.  The  same 
technique  can  be  used  to  solve  the  following  problem  as 
well:  Given  an  array  with  known  element  positions  and 
excitations,  and  given  that  new  elements  are  to  be  in¬ 
troduced  at  known  locations,  how  does  one  excite  (or 
drive)  these  new  elements  to  improve  performance  of 
the  total  array  without  changing  the  excitations  of  any 
of  the  elements  of  the  original  array? 

The  optimization  procedure  in  this  article  is  appli¬ 
cable  when  the  following  premises  obtain: 

(1)  The  wavelength,  X,  of  the  design  frequency  is 
given  and  fixed. 

(2)  The  number  of  elements,  n,  in  the  array  is  fixed 
and  all  the  element  positions  (x,,y,,z,),  k=l are 
known  and  fixed. 

(3)  Individual  element  field  patterns  at  the  design  fre¬ 
quency  are  completely  known. 

199  J.  Acoutt.Soc.  Am.  990).  Jan.  1981 


(4)  The  ambient  noise  field  at  the  design  frequency  is 
completely  known. 

(5)  Element  interactions  can  be  ignored. 

(6)  Element  excitations  can  be  phased  (i.e. ,  complex). 

The  premise  that  the  element  excitations  must  be  al¬ 
lowed  to  be  phased  is  not  necessary.  As  is  pointed  out 
later,  we  can  just  as  easily  require  them  to  be  strictly 
real,  i.e.,  either  positive  or  negative.  However,  ex¬ 
cept  where  noted,  we  assume  that  the  excitations  are 
phased  because  this  is  the  more  general  situation  and 
allows  for  better  performance. 

The  concept  of  DIBC  has  been  defined  and  used  earlier 
by  Butler  and  Unz.1'2  In  these  papers,  DIBC  is  called 
beam  efficiency  and  is  defined  by  them  only  for  line 
arrays.  This  article  is  new  in  three  regards.  First, 
we  apply  the  concept  of  DIBC  to  arbitrary  spatial  ar¬ 
rays  and,  thereby  demonstrate  its  usefulness  in  very 
general  situations.  Second,  we  exhibit  viable  numerical 
procedures  and  techniques  for  overcoming  a  variety  of 
mathematical  difficulties  inherent  in  the  concept  of 
maximizing  DIBC.  Third,  the  above-mentioned  method 
of  optimizing  DIBC  for  general  spatial  arrays  of  any 
number  of  elements,  while  using  only  small  amounts 
of  core  storage  (and  no  peripheral  storage  devices), 
appears  to  be  completely  novel  to  this  article. 

All  the  examples  in  this  article  were  computed  on 
the  Univac  1108  under  EXEC  8.  A  listing  of  the  com¬ 
puter  program  is  available  in  Streit.3  It  is  written  in 
FORTRAN  V  for  the  general  three-dimensional  array 
of  arbitrary  configuration. 

B.  Field  patterns  and  coordinate  system 

The  spherical  coordinate  system  of  Fig.  1  is  used 
throughout  this  article;  however,  a  particular  direction 
( 6,<p )  will  be  specified  by  the  direction  cosines 

coso  =  sin0 cose,  cos|3  =  sin<f>sing,  cosy  =  cos 0  .  (1) 
The  most  general  field  pattern  treated  here  is 

T(e,0)=  ^  a,R,(e,0)exp^</,(g,d>)),  (2) 

where  Rk(8,  <t>)  is  the  phased  (complex)  response  of  the 

199 


-43- 


z 


X 

FIG.  1.  The  coordinate  system. 


feth  element,  and 

dt(B,<t>)  =  xtcosa  +  \<llcosil  +  z,cosy  .  (3) 

Because  of  assumptions  (1)  to  (6),  the  field  pattern 
V(8,<P)  depends  solely  on  the  phased  (complex)  excita¬ 
tions  a„.  The  ambient  noise  field  .  «f>)  will 

enter  in  the  definition  of  optimal  excitations.  |  Alter¬ 
nately,  one  may  think  of  N(0,4>)  as  a  given  non-negative 
weighting  function  of  the  two  angles.) 

C.  Directivity  index  with  beamwidth  control  (DIBC) 

The  antenna  designer  is  required  to  divide  the  set  of 
all  directions,  denoted  fi,  into  three  disjoint  regions: 

911=  mainlobe  region, 

8  =  sidelobe  region  , 

6  =  ignored  region  =  fl  -  (311  u  S) . 

This  division  of  directional  space  is  completely  ar¬ 
bitrary,  except  that  neither  3H  nor  8  can  be  empty  sets 
whereas  6  can  be  empty  if  desired.  Once  a  particular 
choice  of  3H,  8,  and  6  has  been  made,  the  following  def¬ 
inition  of  optimality  is  used. 

Definition  1  The  element  excitations  a,, . . .  ,<z„  are 
optimal  excitations  for  a  given  choice  of  regions  3H,  8, 
and  8  if  and  only  if  the  ratio 

/T  N(0  ,  <b)  I  V2(0 ,0)1  sin<f>  drpde 

DIBC  = -  (4) 

f  f  N(0,<f>)l  VJ(0,  <t>)lsin0 d<(> de 
J  Jm us 

is  maximized.  Any  ratio  of  this  form  will  be  referred 
to  as  a  directivity  index  with  beamwidth  control. 

We  point  out  that  any  excitations  a,, . . .  ,<z„  that  max¬ 
imize  the  DIBC  ratio  (4)  also  maximize  the  ratio 


,  <t>)  I V' "(e ,  <(> )  I  sin0  d<t>  de 
i  Va(0, 0)1  sin<t>d<t>d9 


(4a) 


To  see  this,  note  that 
1  -fJinvt  , 

D,BC7/„ '  71' 


so  that  any  excitations  minimizing  the  reciprocal  of 
DIBC  are  also  excitations  that  minimize  the  reciprocal 


of  the  ratio  (4a),  and  this  proves  our  assertion.  This  is 
not  to  say,  of  course,  that  the  maximum  value  of  (4) 
and  the  maximum  value  of  (4a)  are  equal,  only  that  ex¬ 
citations  that  maximize  the  one  also  maximize  the 
other. 

Maximizing  D1  is  a  limiting  case  of  maximizing  DIBC. 
To  see  this,  recall  that  for  a  specified  direction 
(B 0,<t>0),  DI  is  a  maximum  if  the  ratio 


;  M0o,0n)IV,2(fl..<O 

JJ  N(0,<t>)H'J(0,0)lsin<M0d0 


is  maximized.  Now  let  the  ignored  region  0  be  empty, 
let  the  mainlobe  region,  911,  contain  (0O,0O),  and  let 
8  =  0-911.  Then,  excitations  maximizing  DIBC  con¬ 
verge  to  excitations  that  maximize  DI  as  the  mainlobe 
region,  911,  shrinks  down  on  the  point  (0O,0O). 


We  have  defined  optimal  excitations  as  those  for 
which  DIBC  is  maximized  for  some  choice  of  regions 
311,  8,  and  8.  This  allows  a  measure  of  control  over 
the  beamwidth  and  sidelobe  level.  By  varying  systema¬ 
tically  the  choice  of  311  and  8  and  maximizing  the  DIBC 
for  each  choice,  we  can  examine  directly  the  tradeoff 
between  beamwidth  and  sidelobe  level  for  the  particular 
arrayathand.  The  engineercan,  then,  select  those  excita¬ 
tions  that  best  suit  his  needs.  Generally,  the  larger 
the  mainlobe  region,  311,  and  the  smaller  the  sidelobe 
region,  8  (for  fixed  ignored  region,  0),  the  lower  the 
overall  sidelobe  level  and  the  greater  the  beamwidth. 
However,  this  may  not  always  be  the  case,  since  side¬ 
lobe  level  does  not  enter  directly  into  the  DIBC  ratio 
of  (4).  Nothing  prevents  the  field  pattern  from  having 
narrow  high  amplitude  sidelobes,  since  such  sidelobes 
contribute  little  to  the  integral  in  the  denominator  of 
the  DIBC . 


Another  reason  for  maximizing  DIBC  is  simply  that 
it  is  conceptually  easy  to  do  so.  All  that  is  required  is 
the  solution  of  an  eigenvalue  Eigenvector  problem  (see 
Theorem  1) ,  and  problems  of  this  type  have  been  studied 
extensively  in  the  literature.4  Numerically,  such 
problems  require  considerable  care.  Fortunately, 
well-designed  computer  programs  are  available  for  the 
solution  of  eigenproblems.5-®  With  the  use  of  these  rou¬ 
tines,  the  solutions  of  the  eigenproblems  encountered 
in  the  antenna  problem  seem  to  be  numerically  stable. 
This  is  not  to  say  that  there  may  not  be  arrays  that 
yield  numerically  unstable  eigenproblems. 

A  final  reason  for  maximizing  DIBC  is  more  esoteric. 
In  the  process  of  solving  the  required  eigenproblem, 
all  the  eigenvalue  Eigenvector  pairs  are  computed,  not 
merely  the  largest  one.  It  happens  that  the  field  pat¬ 
terns  corresponding  to  the  lower  order  eigenvalues 
have  some  interesting  features  [see  the  figures  in  ex¬ 
ample  (2)].  In  addition,  it  often  happens  that  some  of 
the  larger  eigenvalues  are  close  together;  i.e.,  sev¬ 
eral  linearly  independent  sets  of  excitations  exist 
which  give  DIBC  values  that  lie  close  together.  (For 
an  analogous  situation,  see  Slepian  and  Poliak.7)  What 
this  means  in  the  antennna  problem  is  that,  without 
sacrificing  antenna  performance  (as  measured  solely 


200 


J.  Acoutt.  Soc.  Am.,  Vol  69,  No.  1 ,  January  1981 


Roy  L.  Strait:  Oiacret*  array*  of  arbitrary  geometry 


200 


by  the  DIBC).  it  becomes  a  simple  matter  to  examine 
numerous  different  sets  of  excitations  with  the  aim  of 
improving  some  completely  different  design  goal  of 
the  array.  [See  (19)  below. |  This  will  not  be  discussed 
further  in  this  article. 

It  must  be  mentioned  that  this  approach  to  the  array 
optimization  problem  does  not  attempt  to  address  sev¬ 
eral  issues  that  are  of  practical  interest.  First,  this 
approach  does  not  guarantee  that  the  array  performance 
is  insensitive  to  perturbations  in  the  optimum  excita¬ 
tions.  The  question  of  sensitivity  to  excitation  pertur¬ 
bation  can  be  examined  only  after  the  optimum  excita¬ 
tions  are  found.  Second,  this  approach  does  not  attempt 
to  control  the  efficiency  of  the  array.  In  other  words, 
it  can  happen  that  the  optimal  excitations  for  a  parti¬ 
cular  array  may  drive  certain  elements  at  their  max¬ 
imum  allowed  levels  while  the  remaining  elements  are 
hardly  driven  at  all,  so  that  the  total  output  power  of 
the  array  is  too  low  for  the  application.  This  problem 
is  common  to  all  amplitude  shaded  arrays  and  can  be 
examined  after  the  optimum  excitations  are  found. 
Finally,  this  approach  to  array  optimization  ignores 
element  interactions,  so  that  it  is  possible  for  optimum 
excitations  derived  by  this  method  (or  by  any  other 
method  for  that  matter)  to  have  undesirable  character¬ 
istics  in  this  regard.  This  possibility,  as  well  as  the 
other  two  possibilities  mentioned  above,  should  be  in¬ 
vestigated  after  optimal  excitations  are  found. 

D.  Computer  storage  problem 

The  primary  drawback  to  maximizing  DIBC  is  that 
the  number  of  computer  storage  locations  required 
(using  the  program  in  Streit3)  is  approximately 

Nr  =  6n2+  16n+  12  000  words  ,  (6) 

for  the  case  of  constant  ambient  noise  field  and  omni¬ 
directional  elements.  Since  the  total  requirement  will 
grow  as  the  ambient  noise  field  and/or  element  field 
patterns  require  more  storage  to  compute,  it  appears 
that  the  direct  computation  of  optimal  excitations  for 
any  array  of  100  or  more  elements  requires  either 
large  main-frame  computers  or  computers  with  virtual 
memory.  However,  the  storage  requirements  for 
maximizing  DIBC  can  be  avoided.  A  technique  known 
as  group  coordinate  relaxation'  gives  a  method  that  can 
be  tailored  to  the  computer  memory  available.  The 
technique  is  an  excellent  example  of  how  to  trade  off 
computer  memory  for  computational  speed.  The  more 
memory  available,  the  faster  the  DIBC  can  be  maxi¬ 
mized. 

Group  coordinate  relaxation,  in  the  context  of  maxi¬ 
mizing  DIBC,  is  simply  stated.  Suppose  there  are  300 


elements  in  the  array.  Make  any  initial  guess  at  the 
optimal  excitations.  Define  distinct  subarrays  of,  say, 
50  elements  each.  By  working  with  the  first  of  these 
subarrays,  new  element  excitations  are  computed  for 
these  50  elements,  so  that  the  DIBC  of  the  entire  300 
element  array  is  increased.  Next,  new  excitations  are 
computed  for  the  second  subarray.  Cycling  through  all 
six  subarrays  in  turn,  until  DIBC  for  the  entire  300  ele¬ 
ment  array  cannot  be  increased  further  by  changing  the 
excitations  in  any  of  the  subarrays,  is  the  essence  of 
group  coordinate  relaxation.  The  method  can  be  proved 
to  be  convergent.  It  yields  the  globally  best  excita¬ 
tions,  not  merely  locally  best.  A  careful  statement  of 
the  algorithm  and  further  remarks  are  given  in  the  sub¬ 
section  on  numerical  solution  of  the  eigenproblem  by 
the  indirect  method. 

The  rate  of  convergence  of  the  group  coordinate  re¬ 
laxation  method  depends  heavily  on  the  size  of  the  sub¬ 
arrays  used.  The  larger  the  subarrays,  the  faster  the 
convergence,  and  the  more  core  storage  required. 

Thus,  core  storage  is  traded  off  in  a  direct  manner  for 
the  convergence  rate  and,  hence,  for  computation  time. 
In  addition,  each  step  of  the  group  coordinate  relaxation 
method  produces  new  excitations  that  increase  the 
DIBC,  so  that  if  the  computations  are  interrupted  for 
any  reason:  (1)  The  last  computed  excitations  are 
better  than  any  of  the  excitations  previously  computed 
and  (2)  by  saving  the  last  computed  excitations,  the 
computations  can  be  resumed  without  significant  loss. 

If  nt  is  the  number  of  elements  in  a  subarray  used 
by  the  group  coordinate  relaxation  process,  the  total 
storage  required  (using  the  program  in  Streit3)  is  ap¬ 
proximately 

Nj,*6)i*+  8(n  +  »,)  +  12  000  words  ,  (7) 

for  the  case  of  constant  ambient  noise  field  and  omni¬ 
directional  elements.  Thus,  memory  requirements 
grow  as  the  square  of  the  subarray  size  no  matter  how 
large  the  full  array  may  be.  By  choosing  the  subarray 
size  sufficiently  small,  the  designer  can  maximize 
DIBC  for  large  arrays  on  computers  of  modest  size. 

The  cost,  however,  is  computer  time.  On  the  other 
hand ,  if  the  designer  has  a  dedicated  minicomputer  of 
reasonable  size,  the  cost  of  computer  time  is  nil. 

II.  ELABORATION  OF  THE  CONCEPT 
A.  DIBC  sod  the  eigenproblem 

Let  the  vector  a=  (a,, . . .  ,am)T  be  the  vector  of  element 
excitations  for  the  field  pattern  V(() ,Q)  given  by  (2). 

Then 


ffn  N(9 , 0) |  V2(0 , 0 ) |  8in0  d0d$  N(9, 0)|  V(9 , 0)| 2  simM0  dl 

=  _/^BN(0,0)  ^ 0^,(9, 0)exp^d,(9,0)^j 

J j^N{6  ,<t>)Rt{6  J(9,<t>)exp(-j-[d/(0,<P)-dt(S,'<t>)  ]Jsin<fid<t>d9  =  firUa  , 


(8) 


201  J.  Acoust.  Soc.  Am.,  Vol.  69,  No.  1.  January  1981 


Roy  L.  Strait:  Ditcrata  arrays  of  arbitrary  gaomatry 


201 


-45- 


where  V  is  an  n  x  n  complex  matrix.  If  V  =  («,y|.  with 
k  denoting  the  row  number  and  i  denoting  the  column 
number,  then 

X  exp^^  [<fi(e,<M-d,(0,4>)]jsin(M<M0  .  (9) 

Clearly,  U  is  a  Hermitian  matrix  (i.e.,  U  =  0T),  since 
it  is  obvious  that  =  'Ty,.  Also,  li  is  positive  definite, 
since 


oTVa=JJ^  A!(e,<f>)|  V3(9,<t>)|sin<f>d<M9>  0,  (10) 


whenever  the  excitation  vector  a*  0  (and  provided  the 
mainlobe  region,  3R,  is  not  a  set  of  measure  zero,  a 
pathological  condition  that  is  not  encountered  in  this 
application).  Therefore,  for  every  mainlobe  region, 
iHl,  the  matrix  U  defined  in  (9)  is  an  n  x  n  positive  def¬ 
inite  Hermitian  matrix.  Similarly, 


N(0,<f>)|V2(9,<f>)|sindrt0<f9  =  <iTlV<7 , 


(ID 


where  W=  |u',J  is  annxn  positive  definite  Hermitian 
matrix  whose  general  entry  is 


N(e,<t>)Rlie,<t')R,(e,<t>) 


x  expl 


[dy(0,  <t>)  -d,(0,  <J>)])sin<M<J><f0  . 


(12) 


Thus,  for  a  given  choice  of  3H,  8,  and  6 ,  we  have 


DIBC  =  aTVa/aT Wa  , 


(13) 


which  is  a  ratio  of  positive  definite  Hermitian  forms. 
Therefore,  optimal  excitations  are  those  that  maximize 
this  ratio  of  Hermitian  forms. 


<18) 

and  this  minimum  is  attained  for  every  eigenvector 
corresponding  to  Finally,  if  1  *  n,  then,  for 
any  constants  a, . a,  not  all  zero,  we  have 

zTUz  u  TWz  -  m,  ,  (19) 

where  z  -  +  ■  •  •+  o,z,. 

The  proofs  of  the  various  parts  of  this  theorem  can 
be  found  in  numerous  sources,  e.g.,  Gantmacher.'' 


For  the  immediate  purposes,  the  most  important 
part  of  this  theorem  is  (17).  It  states  that  optimal  ex¬ 
citations  are  precisely  the  components  of  any  eigen¬ 
vector  corresponding  to  the  largest  eigenvalue  of  the 
generalized  eigenproblem  Vz-  plVz,  where  U  and  IV 
are  defined  by  (9)  and  (12). 


Theoretically,  Theorem  1  solves  the  problem  of  max¬ 
imizing  DIBC  in  the  case  where  all  element  excitations 
can  be  phased.  But  what  is  the  solution  if  all  the  exci¬ 
tations  are  required  to  be  real  (positive  or  negative)? 

In  this  case,  the  ratio  (13)  still  holds,  but  the  excita¬ 
tion  vector,  n,  is  real,  i.e.,  a-a.  Since  V  and  IV  are 
Hermitian,  we  have  the  algebraic  identity 


aTVn  i7T(ReU  )n 
aT  WVi  nr(Re!V)n 


(20) 


Now,  Ret/  and  RelV  are  both  real  symmetric  matrices, 
and  all  the  properties  of  Theorem  1  hold  for  the  real 
generalized  eigenproblem  (Rel/)z  =  MfReW'k  •  The  only 
difference  is  that  now  the  eigenvectors  have  all  real 
components.  Therefore,  if  the  excitations  are  required 
to  be  real,  the  optimal  real  excitations  are  precisely 
the  components  of  any  eigenvector  corresponding  to  the 
largest  eigenvalue  of  the  generalized  real  eigenproblem 

(Rel/)z  =  p(RelV)z  ,  (21) 


The  mathematical  tools  for  handling  ratios  of  the 
iorm  (13)  have  been  known  for  at  least  a  century. 

’ .  e  have  the  following  general  mathematical  result. 

Theorem  V.  If  V  and  IV  are  nx«  Hermitian  matrices 
and  IV  is  positive  definite,  then  the  eigenvalues  of  the 
generalized  eigenproblem 

Vz  =  *Wz  (14) 

are  all  real.  Let  s®  •  •  •  «  denote  these  eigen¬ 
values.  Then,  linearly  independent  vectors  z, . z„ 

can  be  found  that  satisfy 

Uz.=  M.lVz,,  ft=l . n,  (15) 

and 


(l.  i* 

10,  if  k*j 


(16) 


The  vectors  ,z,  are  called  the  eigenvectors  of 

the  eigenproblem  (14).  Also,  we  have 


(17) 

and  this  maximum  is  attained  for  every  eigenvector 
corresponding  to  m,,  and 


where  V  and  IV  are  the  matrices  defined  by  (9)  and  (12). 

In  the  remainder  of  this  article,  we  concern  our¬ 
selves  only  with  phased  excitations.  Everything  that 
we  do,  however,  can  be  recast  for  real  excitations 
simply  by  using  the  real  parts  of  the  matrices  involved. 

A  discrete  reformulation  of  DIBC  is  discussed  in  the 
next  section.  By  way  of  analogy  only,  this  discrete 
version  of  the  DIBC  ratio  is  to  DIBC  as  the  discrete 
Fourier  transform  is  to  the  Fourier  transform.  Fol¬ 
lowing  this  is  a  discussion  of  the  numerical  methods 
for  the  solution  of  the  kind  of  eigenproblems  encounter¬ 
ed  in  this  article. 

B.  A  discrete  version  of  DIBC 

Maximizing  the  DIBC  ratio  (4)  is  mathematically 
tractable,  but  it  is  not  practical.  It  requires  the  solu¬ 
tion  of  an  eigenproblem,  which  in  turn  requires  the 
evaluation  of  approximately  n 3  double  integrals  (9)  and 
(12)  over  subsets  of  the  unit  sphere.  Since  it  is  essen¬ 
tial  that  the  mainlobe  region,  SB!,  and  the  sidelobe  re¬ 
gion,  8,  be  quite  general  in  nature  (i.e.,  be  defined  to 
suit  the  particular  application,  these  double  integrals 
are  in  general  impossible  to  evaluate  explicitly  and  are 


202 


J.  Ac  oust  Soc.  Am.,  Voi.  69.  No.  1,  January  1981 


Roy  L.  Strait:  Diicrete  arrayi  of  arbitrary  geometry 


202 


p  EQUAL  PARTS  fa 


p  EQUAL  PARTS 


- y 


FIC.  2.  The  icosahedron. 


also  difficult  and  time  consuming  to  evaluate  accurately 
by  numerical  methods.  For  these  reasons,  D1BC  itself 
is  not  optimized.  What  is  optimized  is  a  discrete  ver¬ 
sion  (DIBCF)  of  DISC  that  is  not  only  numerically  prac¬ 
tical  to  use,  but  is  also  conceptually  simple. 

The  discrete  D1BC  definition  replaces  the  surface 
integrals  in  ratio  (4)  by  discrete  sums  over  points 
chosen  in  :)R  and  S.  Since  :)!l  and  S  are  not  known 
n  priori,  these  points  are  distributed  uniformly  over 
the  surface  of  the  sphere,  with  each  point  contributing 
one  term  to  the  discrete  sum  and  all  terms  entering 
with  equal  weight.  Ideally,  then,  these  points  must 
show  no  directional  bias  and  must  be  easy  to  compute. 
Furthermore,  it  must  be  possible  to  choose  these 
points  with  any  desired  density  on  the  sphere. 

A  natural  choice  for  points  fulfilling  these  conditions 
is  easy  to  describe,  but  difficult  to  compute.  Choose 
as  points  the  equilibrium  positions  of  a  finite  number 
of  positive  charges  constrained  to  lie  on  the  surface  of 
the  unit  sphere.  When  the  number  of  positive  charges 
is  4,  6,  8,  12,  or  20,  it  is  intuitively  clear  that  stable 
points  for  these  charges  are  at  the  vertices  of  the  five 
regular  Platonic  bodies:  the  tetrahedron,  the  octa¬ 
hedron,  the  cube,  the  icosahedron,  and  the  dodeca¬ 
hedron,  respectively.  Unfortunately,  these  are  the 
only  easy  cases  (see  Melnyk  et  al.*). 

The  discrete  points  chosen  to  define  discrete  DIBC 
are  the  vertices  of  a  geodesic  dome.  Consider  the  ico¬ 
sahedron  shown  in  Fig.  2.  Note  that  in  this  figure  the 
v  axis  is  in  the  plane  of  the  paper  and  the  z  axis  is 
tilted  slightly  to  show  off  the  configuration.  (The  x  axis 
is  not  shown,  but  is,  of  course,  orthogonal  to  the  vz 
plane.)  This  regular  figure  has  12  vertices,  20  faces, 
and  30  edges.  Geodesic  domes  with  (almost)  any  num¬ 
ber  of  faces  are  constructed  from  the  icosahedron  by 
subdividing  its  equilateral  triangular  faces  in  a  sys¬ 
tematic  manner.10  First,  subdivide  each  face  into  con¬ 
gruent  equilateral  subtriangles,  as  shown  in  Fig.  3; 
i.e.,  for  each  positive  integer  p  a  1,  find  p*  1  equi- 
spaced  points  along  each  edge  and  pass  lines  through 
each  of  these  points  parallel  to  the  other  two  edges. 
Next,  take  all  the  vertices  of  the  equilateral  subtri¬ 
angles  so  generated  and  project  them  on  the  unit  sphere. 
By  doing  this  for  each  face  of  the  icosahedron  for  a 
fixed  integer  p*  1,  we  construct  the  vertices  of  a  geo- 


p  EQUAL  PARTS 


FIG.  .7.  One  face  of  icosahedron  subdivided  into  p  parts. 


desic  dome  of  order  p.  We  define  the  Fuller  points, 

7,,  to  be  the  totality  of  these  points. 

The  Fuller  points,  7,,  are  uniquely  oriented  in  Car¬ 
tesian  space  once  the  vertices  of  the  icosahedron  are 
defined.  With  some  simple  trigonometry,  it  can  be 
seen  that  the  12  vertices  of  an  icosahedron  inscribed 
in  a  sphere  of  unit  radius  can  be  taken  to  be  the  2  points 
(0,0,±1),  together  with  the  10  points 

121>(1  -  62)1/2cos2nfc  '5,  26(l-6,),/2sin2itfc/5 , 2f)2  - 1 1 

(2b(l  -  62),/2  cos2!s(A-+  j  )  75 , 

2b(l  -  62)*/2sin2n(t+  i  )  5,  1  -  26’] ,  (22) 

where  k  =  0, 1 , . . .  ,4  and  b-  1  (2cos3it  10)=2SQRT[l0 
-  2SQRT(5)|.  The  edge  length  of  this  icosahedron  is 
2SQRT(1  -62)  =  1.0515. 


How  many  points  are  there  in  .7,?  By  inspecting  an 
unfolded  paper  model  of  the  icosahedron  on  which  the 
Fuller  points  have  been  marked,  it  is  easy  to  see  that 
7,  contains  exactly  10/>2  +  2  points.  Thus,  the  number 
of  steradians  per  point  is  approximately  4jt/10 p* 

=  1.25 /p2. 


Notice  that  the  Fuller  points,  7>(  are  not  quite  ideal. 
Those  points  chosen  near  the  center  of  a  face  of  the  or¬ 
iginal  icosahedron  will  be  less  finely  spaced  when  pro¬ 
jected  on  the  sphere  than  will  those  points  that  were 
chosen  nearer  an  edge.  This  defect  in  7,  does  not  seem 
to  be  significant  in  this  application.  With  the  Fuller 
points  defined,  we  state  the  following. 

Definition  2:  For  a  given  integer  p  "  1,  and  regions 
3U,  8,  and  8,  the  element  excitations  a, . o„  are  op¬ 

timal  if  and  only  if  the  ratio 


DIBCF  = 


(  me,<t>)\v2(e,<t>)i) 

\  (»,»  >eg»03ft _ / _ 

f  23  N(e,<t>)iv2(e,<f')i' 

\  «,•  >6  J^rHORUF 


(23) 


is  maximized.  Any  ratio  of  the  form  of  (23)  will  be 
referred  to  as  a  directivity  index  with  beamwidth  con¬ 
trol  over  the  Fuller  points,  7,. 


Note  that,  as  />-«>,  we  do  not  have  DIBCF  -  DIBC  be¬ 
cause  the  distribution  at  the  Fuller  points  does  not  ap¬ 
proach  the  uniform  distribution  as  p  gets  large.  In 
addition,  we  point  out  that  for  every  p  ’  1,  we  have  the 
inequalities 


0-  DIBCF  -  1, 

provided  only  that  the  denominator  sum  in  (23)  is  non- 


203 


J.  Acoust.  Soc.  Am.,  Voi  69,  No.  1,  January  1981 


Roy  L.  Streit:  Discrete  arrays  ot  arbitrary  geometry 


203 


zero.  The  proof  of  the  lower  bound  is  trivial,  and  the 
proof  of  the  upper  bound  follows  from  the  observation 
that  every  summand  in  the  numerator  of  (23)  appears 
also  in  the  denominator. 

The  formulation  of  DIBCF  as  an  eigenproblem  para¬ 
llels  that  for  DIBC.  Specifically,  we  have 

Mz  *  *1 Sz  ,  (24) 

where  M  =  [mtl\  and  S  =  [st/]  are  nxn  positive  definite 
Hermitian  matrices  with 

T  _ 

»«*<=  N{6,<t>)R)(8,<t>)Rt(e,<b) 

(«,*l€7,n3K 

*expj(2irt/A)[dj(e,<f>)-d,(<),<f>)  )|  ,  (25) 

s„  =  H  N(B,<t>)Rl(8,<t>)Rl,(e,<t>) 

<*.•>€  jp  no*us» 

x  exp|(2nt/A)ldi(e,<i.)-d,(s,<())j|  .  (26) 

By  Theorem  X,  maximizing  DIBCF  requires  the  compu¬ 
tation  of  any  eigenvector  corresponding  to  the  largest 
eigenvalue  for  the  eigenproblem  (24).  The  numerical 
solution  of  (24)  is  discussed  fully  in  the  next  section. 

There  are  two  considerations  that  should  enter  into 
the  particular  choice  of  p  for  the  Fuller  points  T,. 

First,  the  Fuller  points  should  be  numerous  enough 
to  sample  adequately  the  worst  behavior  of  any  real¬ 
izable  field  pattern.  In  other  words,  p  should  be  large 
enough  that  even  the  narrowest  sidelobe  achievable  in 
the  field  pattern  will  contain  points  in  T,.  Second , 
Theorem  1  requires  that  the  denominator  matrix  8  of 
the  DIBCF  ratio  be  positive  definite.  Normally,  the 
sampling  criterion  will  effect  this  automatically. 


C.  Numerical  solution  of  the  eigenproblem: 

Direct  method 

The  eigenproblem  (24)  is  equivalent  to  the  eigenprob¬ 
lem 

(S*1M)z  =  w.  (27) 

In  other  words,  the  eigenvalues  and  eigenvectors  of 
(27)  are  precisely  the  same  as  those  of  (24).  There  are 
two  difficulties  in  using  (27)  for  numerical  computation. 
First,  it  requires  the  inverse  of  the  matrix  S,  whose 
only  special  structure  is  that  it  is  positive  definite 
and  Hermitian.  In  general,  numerical  computation  of 
the  inverse  of  matrices  should  be  avoided  if  possible. 
Second,  (27)  is  not  a  Hermitian  eigenproblem;  i.e., 

S~'M  is  not  necessarily  Hermitian  even  though  S  and  Af 
are  both  Hermitian.  This  means  that  the  eigenvalues 
and  eigenvectors  of  (27)  must  be  computed  by  a  routine 
designed  for  a  general  complex  matrix,  and  this  means 
that  the  eigenvalues  can  (and  do)  turn  out  to  be  complex 
numbers  because  of  numerical  roundoff.  Since  Theorem 
1  requires  that  all  the  eigenvalues  be  strictly  real 
numbers,  there  is  numerical  error  in  using  (27)  caused 
by  destruction  of  the  natural  Hermitian  symmetry  in 
(24).  For  these  reasons,  it  is  desirable  to  solve  the 
eigenproblem  (24)  directly. 

204  j.  Ac  Quit  Soc.  Am.,  Vol.  69,  No.  1,  January  1981 


Martin  and  Wilkinson  give  a  method  and  a  routine 
for  solving  this  eigenproblem  when  Af  and  S  are  real 
symmetric.  Both  the  technique  and  the  routine  can  be 
adapted  to  the  Hermitian  case.  Every  Hermitian  posi¬ 
tive  definite  matrix  S  has  the  Cholesky  decomposition 

S*irr ,  (28) 

where  L  is  a  lower  triangular  matrix.  Thus, 

Afz  =  txLZ  Tz ,  L‘'Mz  =  nL  Tz  ,L-'M(Z-tL  t)z  =  m!  rz , 
(L~lAlZ~T)x -  px  ,  (29) 

where 

x  e  Z  Tz  .  (30) 

Therefore,  the  eigenvalues  of  L~'AIL~T  are  precisely 
the  eigenvalues  of  (24),  and  the  eigenvectors  x  of 
L ‘‘Af  Z "r  and  the  eigenvectors  z  of  S‘‘Af  are  related  by 
(30).  Note,  also,  that  (29)  is  a  Hermitian  eigenprob¬ 
lem,  since  L~'AlZ~T  is  a  Hermitian  matrix.  It  is, 
therefore,  possible  to  solve  (29)  by  numerical  methods 
designed  for  Hermitian  eigenproblems  that  explicitly 
use  the  fact  that  the  eigenvalues  are  real.5  Therefore, 
the  eigenvalues  computed  by  using  (29)  will  always  be 
real,  as  required. 

This  computational  procedure  seems  to  require  a 
prohibitively  large  number  of  arithmetic  operations; 
however,  the  computations  may  be  done  very  efficiently 
because  of  the  special  structure  of  the  matrices  involved. 
For  example, the  matrix  Z,"‘.VfX"rcan  be  computed  (with¬ 
out  inverting  the  matrix  L)by  using  only  In3  complex  mul¬ 
tiplications.  This  compares  to  i>i3  complex  multiplications 
in  the  computation  of  S~'  alone  in  (27).  In  terms  of  storage 
required,  computation  time,  and  numerical  accuracy,  the 
use  of  (29)  and  (30)  is  preferable  to  the  use  of  (27). 

The  routine  in  Martin  and  Wilkinson*  was  adapted  to 
the  Hermitian  case, using  routinesin  Ref.  5  to  solve  the  or¬ 
dinary  eigenproblem.  This  routine  is  called  PENCLH, 
and  its  listing  is  available  in  Streit.’  (The  listings  of 
the  routines  used  from  Ref.  5  are  not  available;  they  are 
proprietary  information  under  terms  of  the  lease  ar¬ 
rangements  made  with  International  Mathematical  and 
Statistical  Libraries,  Incorporated.)  Finally,  it  is 
pointed  out  that  the  routine  PENCLH  computes  all  the 
eigenvalues  and  eigenvectors  of  (24),  and  not  merely 
the  largest  eigenvalue  and  corresponding  eigenvectors ). 

D.  Numerical  solution  of  the  eigenproblem: 

Indirect  method 

As  discussed  in  the  section  on  the  computer  storage 
problem,  the  drawback  to  the  direct  method  is  exces¬ 
sive  computer  storage  for  large  arrays.  The  group 
coordinate  relaxation  (or  indirect)  method  overcomes 
this  drawback,  but  at  the  cost  of  computer  time  and  the 
loss  of  ability  to  compute  the  lower  order  eigenvalues 
eigenvectors.  The  group  coordinate  relaxation  method 
is  detailed  by  Faddeev  and  Faddeeva*  for  the  real  sym¬ 
metric  eigenproblem  Ar=  m*.  This  method  can  be  ex¬ 
tended  easily  to  the  Hermitian  eigenproblem 

Mz  *>  pSz  .  (31) 

Although  the  method  can  be  extended  to  arbitrary  Her- 

Roy  L.  Streit:  Oijcrete  arreyi  of  arbitrary  geometry  204 


-48- 


mitian  matrices  M  and  S,  with  S  positive  definite,  it 
is  important  here  to  retain  the  structure  of  .1/  and  S 
as  given  by  (25)  and  (26).  The  reason  is  that  the  Her- 
mitian  forms  of  M  and  S  can  be  evaluated  directly  with¬ 
out  knowledge  of  any  of  the  entries  of  either  matrix. 
This  is  the  fact  that  allows  the  computer  storage  prob¬ 
lem  to  be  overcome. 

The  following  notation  will  be  very  useful.  Define  the 
basis  vectors 

c,=  (l  0  0  ...  0  0)r  , 

e2»<0  1  0  ...  0  0)r, 

(32a) 

c„=  (0  0  0...  0  l)r  . 

Note  that  each  of  these  vectors  is  of  dimension  n.  To 
define  vectors  em  for  1,  we  first  set 

/(„,)-(  "•  is  an  integral  multiple  of  »  (32b) 

(»/-[«/  ») !«,  if  i 


where  [  |  denotes  the  greatest  integer  function.  Since 
(32b)  requires  that  1  /(»/)=>  ».  we  can  now  define 

<'n  =  e,tm,.rn  *  h+  1  .  (32. 

In  other  words,  we  have  defined 

]  ~  “  **2ii,1  “  •  •  •  • 

^2“  en*2~  ^211.2"  ■  •  ■  . 


Before  the  group  coordinate  relaxation  algorithm  can 
begin,  two  items  must  be  specified.  First,  an  initial 
guess 

. *r>r.  04) 

for  the  optimal  element  excitation  vector  is  required. 
The  vector  <i(0)  should  not  contain  all  zero  entries,  but  it 
is  completely  arbitrary  otherwise.  Second,  it  must  be 
decided  in  some  manner  to  work  with  subarrays  of  the 
full  array  of  size  r>-  1.  It  will  be  shown  that  choosing 
to  work  with  subarrays  of  size  r  will  mean  that  general¬ 
ized  eigenproblems  of  size  r+  1  will  have  to  be  solved, 
so  computer  storage  plays  an  important  role  in  the 
choice  of  r.  Another  important  consideration  is  com¬ 
putation  time.  In  general,  the  larger  r  is  taken  to  be, 
the  faster  optimum  excitations  of  the  full  array  can  be 
computed. 

The  group  coordinate  relaxation  algorithm  is  most 
easily  described  by  exhibiting  the  first  two  steps  of  the 
algorithm.  From  these  steps  it  is  easy  to  see  the  gen¬ 
eral  procedure.  In  the  first  step,  we  seek  to 

x  tMx 

maximize -ryr— ,  (35) 

ieo0  *  !>x 

where  is  the  vector  space  of  dimension  r*  1  whose 
general  element,  x,  can  be  written  in  the  form 

*'c0a(0)+c1<?i+ (36) 

for  some  complex  constants  c0,cIt. . .  ,cr.  It  is  shown 


that  (35)  is  a  ratio  of  iiermitian  forms  in  the  para¬ 
meters  r„,c, . fr.  Therefore,  by  Theorem  1,  the 

solution  of  (35)  requires  solving  an  eigenproblem  of 
size  r+l.  Let 

n<n  ~  ^0n(0)+  +  •  ■  •  +  CreT  •  (3 

be  a  vector  for  which  the  maximum  (35)  is  attained. 
This  completes  the  first  step.  In  the  second  step,  we 
seek  to 

X  T\IX 

maximize  _f  ,  (3 

l€  9]  X  JX 

where  (j,  is  the  vector  space  of  dimension  r+  1  whose 
general  element,  x ,  can  be  written  in  the  form 


for  some  complex  constants  c0,c, . cr.  Since  (38) 

is,  again,  a  ratio  of  Hermitian  forms  in  the  parameters 
c0,f ,, . . .  ,cr,  we  solve  an  eigenproblem  of  size  r +  1  to 
compute  a  vector 


for  which  the  maximum  (38)  is  attained.  This  completes 
the  second  step.  Continuing  in  this  fashion  defines  the 
group  coordinate  relaxation  algorithm. 

We  see  that  this  algorithm  cycles  through  the  entire 
array  using  subarrays  of  size  r.  This  is  because  the 
basis  vectors  {<>,}  are  defined  to  cycle  regularly  through 

the  vectors  {<>,. Pj . e„}.  Also,  if  r  does  not  divide 

it  evenly,  each  individual  element  belongs  to  a  number 
of  different  subarrays  as  the  computation  proceeds. 

In  other  words,  if  r  does  not  divide  «,  the  entire  array 
is  not  subdivided  into  disjoint  subarrays. 

The  group  coordinate  relaxation  algorithm  generates 

a  sequence  of  vectors  o(0),a(1>,fl(2 .  that  converges 

to  an  eigenvector  corresponding  to  the  largest  eigen¬ 
value  of  (24).  Convergence  is  assured  regardless  of 
the  starting  vector,  with  some  highly  unlikely  excep¬ 
tions.  These  exceptions  are  easy  to  state.  If  any  of 
the  computed  vectors  {a(0),a(1),a(2), •  •  •}  is  precisely 
an  eigenvector  of  (24)  that  corresponds  to  an  eigen¬ 
value  which  is  not  the  largest  eigenvalue  of  the  equa¬ 
tion,  the  group  coordinate  relaxation  method  will  not 
move  from  this  eigenvector.  Numerical  roundoff  error 
probably  will  prevent  this  in  practice.  For  further  dis¬ 
cussion  and  for  a  convergence  theorem  whose  proof 
can  be  extended  to  the  present  situation,  see  Faddeev 
and  Faddeeva.8  For  possible  applications  of  these 
mathematical  methods  to  other  problems,  see  Lee.u 

An  important  feature  is  that  the  last  computed  vector, 
a,,,,  gives  a  larger  D1BCF  than  the  previous  vector, 
a This  is  easy  to  see  by  observing  the  ratios  (35) 
and  (38). 

Another  very  useful  observation  is  that  the  algorithm 
requires  knowledge  only  of  a„,  to  compute  n,,.,,.  This 
means  that  if  computation  must  be  interrupted  for  any 
reason,  it  is  necessary  to  store  only  the  last  computed 
vector  in  order  to  restart  computations. 

It  is  now  easy  to  see  how  to  solve  the  problem  men¬ 
tioned  in  the  introduction,  namely,  how  to  excite 


J.  Acoust.  Soc  Am.,  Voi.  69,  No.  1.  January  1981 


Roy  L.  Strait:  Discrete  arrays  of  arbitrary  geometry 


-49- 


(drive)  new  elements  being  added  to  an  existing  array 
without  changing  the  excitations  of  any  of  the  original 
array  elements.  Let  N s  be  the  number  of  elements  in 
the  existing  array,  and  let  A  A  be  the  number  of  ele¬ 
ments  to  be  added  to  this  array.  Now,  number  the  » 

=  A'S  +  S' 4  elements  in  the  full  array  so  that  the  new  ele¬ 
ments  are  numbered  1,2,...  ,N  A ,  and  the  elements  of 

the  original  array  are  numbered  A'„+  1,A’a  +  2 . SA 

*N s.  The  solution  of  this  problem  is  to  perform  pre¬ 
cisely  one  iteration  of  the  group  coordinate  relaxation 
algorithm  with  the  number  of  elements  relaxed  equal  to 
Na.  In  other  words,  set  r  =  N A  in  (36)  and  compute  that 
-v  in  V0  f°r  which  the  maximum  in  (35)  is  attained.  The 
required  excitations  for  the  additional  elements  are 
given  explicitly  by  a ,  =  ct/c„,k=  1, . . .  ,Ai A,  where  we 
have  used  the  notation  of  (37). 


We  conclude  this  section  by  an  examination  of  the 
maximum  (35).  Everything  that  is  said  of  (35)  is  easily 
translated  to  the  maximum  (38),  as  well  as  all  the 
other  maxima  required  in  the  group  coordinate  relaxa¬ 
tion  algorithm.  Note,  first,  that  putting  (36)  into  (35) 
gives  the  identity 


xtMx  z  tGz 


where  2  =  (cn,c, . c,)r,  and  G  =  It:*,)  and  [*M|  are 

(r  +  l)x  (r+  1)  Hermitian  matrices  whose  general  en¬ 


tries  are  given  by 


2»oo  =  a  IO>J^atO)  * 

A'o*  “  A'* o  "  **  to  ( C* ,  k  ~  1 ,  .  .  .  ,  T  ,  (42 ) 

Kks  =  elMe,,  1 . r, 


boo~  a  <0>Sa(0>  I 

&0,  =  6*0=  a  (o)  SeA ,  k  -  1 , .  . .  ,  r  ,  (43) 

btl=elSe,,  k,j=  1 . r. 

Thus ,  the  entries  of  G  and  B  are  computable  from  the 
Hermitian  forms  of  Af  and  S,  respectively.  Let  Vo(0 ,  <S> ) 
be  the  field  pattern  of  the  entire  array  for  the  excita¬ 
tions  a(0>.  Then,  we  have,  explicitly, 

*<«  =  £  N(e,<M|v;(0,<O)|  ,  (44) 

<«,*K*,nst 

*,„=*<*  =  Z  W(0,<O)Vo(fl,<f>)ft.(0,<j>) 

<*,• 


X  exp{- (2i7t  AH,(0,<t)J  ,  fc=l,...,r, 

(45) 

where  mkl  is  given  by  (25),  and  similarly, 

(46) 

t>oo=  Z  W(0,*)|v;(0,0)|  , 

<#,•>€  »,n  <*u*> 

(47) 

^»o=  *>o*=  ^  N(e,<M 

«•*#)€  n«mu*> 

x  Va(9,0)B,(9,0)  exp[-(2TTt/A)rf,(0,<f>)  , 

(48) 

k=l,...,r , 

6»r  =  s»<.  *J  =  1 . r 

(49) 

y  RINGS  or 
ELEMENTS 


> 

FIG.  4.  Arrangement  of  elements  in  example  1. 


where  stJ  is  given  by  (26).  Because  Vo(0,<f>)  can  be 
computed  easily  for  each  {9,0),  we  see  that  (44)  through 
(49)  can  be  computed  efficiently  in  terms  of  time  and 
core-storage  requirements.  Now,  by  using  Theorem  1, 
we  see  that  the  maximum  of 

zTGz  >zT  Bz  ,  (50) 

is  achieved  by  any  vector 

2  =  <c0, ,Cr)T  (51) 

which  is  an  eigenvector  of  the  largest  eigenvalue  of 
Gz  =  fiBz.  Thus,  from  (41),  we  see  that 

‘’„,  =  c0a,0,+  clel*  .  . .  *rrer  .  (52) 

is  a  vector  for  which  the  maximum  (35)  is  attained. 

lit.  EXAMPLES 

A.  Example  1 :  A  105  element  cylindrical  array 

This  example  illustrates  the  use  of  subarrays  (i.e., 
the  group  coordinate  relaxation  method)  for  computing 
optimum  DIBCF  with  limited  computer  storage.  We 
select  an  array  with  105  elements  arranged  around  a 
cylinder.  Specifically,  we  first  construct  7  rings  of  15 
elements  each  and  then  place  the  axis  of  each  of  these 
rings  along  the  x  axis  (see  Fig.  4).  The  exact  positions 
(and  element  numbers)  are  given  in  Table  I,  where  the 
units  of  length  are  such  that  the  wavelength  A  =  l. 

Each  element  of  this  array  has  a  hemispherical  field 
pattern  defined  in  the  following  manner.  We  conceive 
of  the  array  as  being  supported  by  a  (transparent) 
cylinder.  Through  each  element,  we  pass  a  tangent 
plane  parallel  to  the  cylinder  axis.  The  field  pattern 
of  an  element  has  unit  response  on  the  side  of  the  plane 
that  does  not  contain  the  cylinder  and  has  zero  response 
on  the  side  that  does  contain  the  cylinder.  We  assume 
that  the  ambient  noise  field  is  flat.  Also,  we  choose 
t>  =  32  in  the  definition  of  the  Fuller  points 

The  mainlobe  region,  3R,  is  defined  as  a  half  cone 
lying  above  the  positive  x  axis.  Specifically,  consider 
the  solid  cone  with  axis  tying  along  the  positive  x  axis, 
with  its  vertex  at  the  origin,  and  with  a  vertex  angle 
of  40°.  The  xv  plane  slices  this  cone  into  two  equal 
parts,  and  the  mainlobe  region,  311,  is  defined  to  be 
that  part  of  the  cone  that  lies  above  the  xv  plane  (i.e., 
points  having  positive  z  coordinates).  The  sidelobe 
region,  8,  is  defined  to  be  the  set  of  all  directions 
that  are  not  in  the  mainlobe  region,  3R.  There  is  no 


206  J.  Acouat.  Soc.  Am.,  Vol.  69,  No.  t,  January  1981 


Roy  L.  Strait;  Ditcrete  array*  of  arbitrary  geometry  206 


-50- 


TABLE  I.  Coordinates  of  elements  in  example  1. 


Element 

no. 

X 

Coordinates 

V 

z 

1 

0.0000 

0.7642 

0.0000 

2 

0.3337 

0.7642 

0.0000 

3 

0.6674 

0.7642 

0.0000 

4 

1.0011 

0.7642 

0.0000 

5 

1.3348 

0.7642 

0.0000 

6 

1.6685 

0.7642 

0.0000 

7 

2.0022 

0.7642 

0.0000 

8-14 

As  above 

0.6982 

0.3108 

15-21 

0.5114 

0.5679 

22-28 

0.2362 

0.7268 

29-35 

-0.0799 

0.7600 

36-42 

-0.3821 

0.6618 

43-49 

-0.6183 

0.4492 

50-56 

-0.7475 

0.1589 

57-63 

-0.7475 

-0.1589 

64-70 

-0.6183 

-0.4492 

71-77 

-0.3821 

-0.6618 

78-84 

-0.0799 

-0.7600 

85-91 

0.2362 

-0.7268 

92-98 

0.5114 

-0.5679 

99-105 

0.6982 

-0.3108 

ignored  region,  0,  in  this  example. 

With  the  above  choices,  the  D1BCF  array  problem  is 
completely  specified.  In  this  case,  we  use  subarrays 
to  optimize  the  full  array  because  the  direct  method  of 
optimization  requires  more  core  storage  on  the  Univac 
1108  than  is  available.  (If  the  Univac  1108  had  virtual 
memory,  the  use  of  subarrays  would  not  be  required. 
On  the  other  hand,  one  might  still  use  subarrays  on  a 
machine  with  virtual  memory  for  a  variety  of  other 
reasons.)  It  seems  best  to  use  as  many  array  elements 
as  can  be  handled  easily  in  the  available  computer 
storage,  so  in  this  case  we  choose  69  elements,  i.e., 
roughly  two-thirds  of  the  full  array.  By  using  the 
program  in  Streit,5  we  require  only  45  000  words  of 
main  memory. 

The  group  coordinate  relaxation  scheme  required 
roughly  1650  s  per  iteration,  and  5  iterations  in  all. 
Thus,  total  computation  time  was  roughly  2.25  h.  Table 
II  gives  the  final  (optimal)  set  of  element  excitations. 
The  vertical  field  pattern  is  given  in  Fig.  5,  and  Fig.  6 
gives  the  horizontal  field  pattern  for  these  excitations. 
We  point  out  that  the  field  patterns  in  these  two  figures 
have  abrupt  jumps  because  the  individual  element  field 


TABLE  n.  Optimum  excitations  for  example  1. 


Element 

no. 

Magnitude 

Phase 

Element 

no. 

Magnitude 

Phase 

Element 

no. 

Magnitude 

Phase 

i 

0.014  97 

-2.041  50 

36 

0.001  97 

1.820  93 

71 

0.036  56 

2.70718 

2 

0.044  56 

1.554  78 

37 

0.003  07 

-1.47902 

72 

0.085  37 

-0.05512 

3 

0.076  01 

-1.27318 

38 

0.008  57 

1.588  49 

73 

0.12945 

-2.858  12 

4 

0.091  75 

2.152  64 

39 

0.014  55 

-1.39023 

74 

0.142  65 

0.617  88 

5 

0.082  06 

-0.724  47 

40 

0.01616 

1.955  39 

75 

0.117  32 

-2.178  31 

6 

0.052  16 

2.686  48 

41 

0.012  82 

-0.942  79 

76 

0.069  27 

1.340  10 

7 

0.02126 

-0.153  55 

42 

0.006  53 

2.490  85 

77 

0.02574 

-1.306  99 

8 

0.01160 

-0.728  50 

43 

0.00811 

0.104  99 

78 

0.070  23 

2.827  35 

9 

0.035  88 

2.552  05 

44 

0.023  91 

-2.953  28 

79 

0.188  62 

0.017  20 

10 

0.06133 

-0.396  98 

45 

0.041  58 

0.424  16 

80 

0.308  92 

-2.834  28 

11 

0.07342 

2.934  76 

46 

0.05011 

-2.473  88 

81 

0.35755 

0.590  43 

12 

0.064  03 

0.000  79 

47 

0.044  22 

0.926  78 

82 

0.304  68 

-2.26194 

13 

0.038  93 

-2.92334 

48 

0.027  16 

-1.925  83 

83 

0.184  37 

1.18512 

14 

0.013  98 

0.469  31 

49 

0.01048 

1.50198 

84 

0.067  83 

-1.59534 

15 

0.00348 

-0.013  23 

50 

0.012  66 

-1.160  74 

85 

0.056  00 

2.520 14 

16 

0.012  78 

3.13121 

51 

0.03973 

2.244  29 

86 

0.13959 

-0.31104 

17 

0.022  96 

0.298  55 

52 

0.067  34 

-0.654  77 

87 

0.219  23 

3.11525 

18 

0.02717 

-2.506  53 

53 

0.079  80 

2.707  91 

88 

0.245  87 

0.254  34 

19 

0.02319 

0.987  64 

54 

0.06917 

-0.208  62 

89 

0.203  37 

-2.595  81 

20 

0.014  09 

-1.732  54 

55 

0.041  97 

-3.121  57 

90 

0.11910 

0.86311 

21 

0.005  70 

1.789  55 

56 

0.01513 

0.260  34 

91 

0.04179 

-1.86568 

22 

0.010  32 

1.683  25 

57 

0.016  76 

-2.085  31 

92 

0.02626 

2.889  25 

23 

0.026  24 

-1.204  92 

58 

0.052  36 

1.523  29 

93 

0.064  38 

0.22940 

24 

0.04616 

2.097  37 

59 

0.090  72 

-1.299  56 

94 

0.102  32 

-2.514  17 

25 

0.057  83 

-0.904  29 

60 

0.109  97 

2.134  97 

95 

0.118  89 

1.006  83 

26 

0.053  88 

2.376  97 

61 

0.097  76 

-0.740  37 

96 

0.103  93 

-1.759  55 

27 

0.036  79 

-0.61724 

62 

0.06142 

2.67618 

97 

0.066  23 

1.767  56 

28 

0.01619 

2.703  56 

63 

0.02515 

-0.164  89 

98 

0.026  74 

-0.955  27 

29 

0.032  97 

1.32646 

64 

0.018  74 

-2.678  88 

99 

0.014  31 

-2.548  78 

30 

0.087  31 

-1.599  98 

65 

0.058  33 

0.974  72 

100 

0.04566 

1.152  51 

31 

0.146  26 

1.74147 

66 

0.10512 

-1.807  74 

101 

0.081  78 

-1.633  09 

32 

0.17553 

-1.20741 

67 

0.132  27 

1.655  09 

102 

0.103  00 

1.829  50 

33 

0.15743 

2.124  65 

68 

0.12180 

-1.182  57 

103 

0.094  4  5 

-1.01612 

34 

0.10196 

-0.81950 

69 

0.079  79 

2.258  32 

104 

0.061  24 

2.421  74 

35 

0.040  99 

2.524  82 

70 

0.032  12 

-0.58940 

105 

0.025  21 

-0.418  90 

207  j.  Acoust.  Soc.  Am..  Vol.  69,  No.  1,  January  1981 


Roy  L.  Streit:  Diicrete  arrays  of  arbitrary  geometry  207 


-51 


ANGIE  MEASURED  FROM  x-AXIS  (deg) 


FIG.  5.  Vertical  field  pattern  for  example  1  with  excitations 
given  In  Table  I. 


patterns  have  sharp  jumps,  due  to  their  assumed  hemi¬ 
spherical  shape.  (These  field  patterns  were  computed 
by  the  program  described  by  Lee  and  Leibiger.12)  Also, 
we  point  out  that  the  geometry  of  the  array  and  of  the 
mainlobe  region,  JR,  implies  that  the  optimum  field 
pattern  be  symmetric  about  endfire  in  the  horizontal 
plane.  That  is,  in  Fig.  6,  the  field  pattern  should  be 
symmetric  about  0°.  The  fact  that  it  is  not  is  due  en¬ 
tirely  to  ending  the  computations  after  the  fifth  itera¬ 
tion.  Further  iterations,  presumably,  would  yield  in¬ 
creasingly  symmetric  horizontal  field  patterns. 

This  method  creates  a  steadily  increasing  sequence 
of  estimates  for  the  largest  eigenvalue.  Since  there 
were  five  iterations,  there  were  five  estimates  and 
these  are  given  in  Table  III.  Based  on  this  table  and 
on  the  field  patterns  of  Figs.  5  and  6,  it  would  seem 
that  additional  iterations  of  the  algorithm  would  be  only 
marginally  worthwhile.  In  other  words,  to  all  intents 
and  purposes,  the  array  excitations  have  been  opti¬ 
mized  successfully. 


ANGLE  MEASURED  FROM  x-AXIS  (deg) 

FIG.  6.  Horizontal  field  pattern  for  example  1  with  excita¬ 
tions  given  in  Table  D. 


TABLE  III.  Group  coordinate  relaxation  estimates  of  largest 
eigenvalue  for  example  1 . 


Iteration  no. 

Estimate  of  largest  eigenvalue 

1 

0.91427 

2 

0.96100 

3 

0.963  74 

4 

0.965  32 

s 

0.965  59 

B.  Example  2:  A  comparison  with  Dolph-Chebyshev 
design 

This  example  serves  two  purposes.  First,  it  pro¬ 
vides  a  comparison  with  the  Dolph-Chebyshev  line 
array  design.  Second,  it  gives  some  insight  into  the 
nature  of  the  lower  order  eigenvalues /eigenvectors. 

Suppose  we  have  a  line  array  of  15  elements  that  lies 
along  the  y  axis  (see  Fig.  1)  with  equal  spacings  of  0.5 
wavelength,  where  the  wavelength  A  =  1.  The  units  of 
length  are  irrelevant.  Thus,  if  the  first  element  lies 
at  the  origin  with  coordinates  (Q.,0.,0.),  the  15th  ele¬ 
ment  has  the  coordinates  (0. ,  7. ,  0. ).  It  is  well  known 
that  any  line  array  has  a  field  pattern  with  cylindrical 
symmetry  about  the  array  axis.  Therefore,  we  define 
3H  to  be  the  set  of  all  directions  that  lie  within  8°  of  a 
normal  to  the  v  axis,  and  we  define  S  to  be  the  collec¬ 
tion  of  all  other  directions.  Hence,  M  is  a  16°  wide 
annulus  and  both  :M  and  S  are  cyiindrically  symmetric. 
The  ambient  noise  field  is  assumed  to  be  flat,  and  the 
individual  elements  are  assumed  to  be  omnidirectional. 
Finally,  considering  the  construction  of  the  Fuller 
points,  T,,  we  choose  p  =  24. 

The  above  data  completely  define  the  DIBCF  array 
problem.  In  Streit,1  a  listing  of  the  entire  computer 
program  required  for  exactly  this  example  is  given. 
The  results  of  the  execution  are  given  in  Table  IV. 
Computation  time  on  the  Univac  1108  (under  EXEC  8) 
was  about  41  s.  The  field  pattern  in  the  xy  plane  is  given 
in  Fig.  7. 

TABLE  IV.  Excitations  for  15-element  equispaced  line  array: 
Dolph-Chebyshev  versus  DIBCF. _ 


DIBCF 

Element  no.  Dolph-Chebyshev  (p~  24) 


1 

0.343  71 

0.256  87 

2 

0.357  75 

0.39520 

3 

0.504  03 

0.540  24 

4 

0.653  38 

0.682  90 

5 

0.79108 

0.812  42 

6 

0.902  42 

0.91188 

7 

0.974  87 

0.97920 

8 

1.00000 

1.000  00 

9 

0.974  87 

0.979  20 

10 

0.90242 

0.91188 

11 

0.79108 

0.81242 

12 

0.653  38 

0.682  90 

13 

0.504  03 

0.540  24 

14 

0.357  75 

0.395  20 

15 

0.343  71 

0.25687 

208  J.  Acou»t.  Soc.  Am.,  Vol.  69.  No.  1,  January  1981 


Roy  L.  Streit:  Discrete  arrays  of  arbitrary  geometry  208 


-52- 


FIG.  7.  Field  patterns  for  excitations  in  Table  IV. 


The  Dolph-Chebyshev  excitations  are  designed  ex¬ 
clusively  for  half-wavelength  equispaced  line  arrays 
with  omnidirectional  elements.  For  a  given  number  of 
elements,  the  Dolph-Chebyshev  excitations  depend  only 
on  the  steered  direction  and  on  the  specified  sidelobe 
level.  For  a  broadside  (i.e.,  steered  normal  to  the  line 
of  the  array)  15  element  array,  the  Dolph-Chebyshev 
excitations  for  a  28  dB  sidelobe  level  field  pattern  are 
given  in  Table  IV.  The  corresponding  field  pattern  is 
shown  in  Fig.  7. 

We  note  that  the  mainlobe  shape  of  the  Dolph-Cheby¬ 
shev  array  and  the  DIBCF  array  are  indistinguishable. 
The  only  difference  lies  in  sidelobe  structure.  We  see 
that  by  sacrificing  approximately  3  dB  in  the  sidelobe 
nearest  the  mainlobe,  all  the  remaining  sidelobes  can 
be  made  smaller  than  the  overall  28  dB  sidelobe  level 
of  the  Dolph-Chebyshev  array. 

What  about  the  lower  order  eigenvalues?  The  first 
four  eigenvalues  eigenvectors  are  listed  in  Table  V. 
(Note  that  the  eigenvector  of  m,  in  Table  V  is  the  same 
as  DIBCF  in  Table  IV,  but  is  normalized  differently.) 
Also,  the  corresponding  field  patterns  are  given  in 
Figs.  8-11.  We  remark  only  that  the  field  pattern  for 
the  largest  eigenvalue  p,  has  no  nulls  in  the  mainlobe 


TABLE  V.  The  four  largest  eigenvalues/eigenvectors  of  ex¬ 
ample  2. 


Element  no. 

Mr  0.9894 

U4  -  0.8206 

0.3231 

M«  -  0.0383 

1 

0.0916 

0.2755 

0.4486 

0.5106 

2 

0.1409 

0.3158 

0.3666 

0.2141 

3 

0.1927 

0.3289 

0.2442 

-0.0356 

4 

0.2436 

0.3112 

0.1066 

-0.2042 

5 

0.2898 

0.2675 

-0.0242 

-0.2664 

6 

0.3252 

0.1929 

-0.1413 

-0.2454 

7 

0.3492 

0.1030 

-0.2097 

-0.1391 

8 

0.3567 

0.0000 

-0.2403 

0.0000 

9 

0.3492 

-0.1030 

-0.2097 

0.1391 

10 

0.3252 

-0.1929 

-0.1413 

0.2454 

11 

0.2898 

-0.2675 

-0.0242 

0.2664 

12 

0.2436 

-0.3112 

0.1066 

0.2042 

13 

0.1927 

-0.3289 

0.2442 

0.0356 

14 

0.1409 

-0.3158 

0.3666 

-0.2141 

15 

0.0916 

-0.2755 

0.4486 

-0.5106 

region,  :)Tl,  whereas  the  field  pattern  for  has  one 
null  in  811.  two  nulls  for  p3,  and  three  nulls  for  p.,. 

C.  Example  3:  Effects  of  sampling 

The  first  two  examples  did  not  mention  the  effects 
of  sampling  on  the  field  patterns.  Specifically,  the  pa¬ 
rameter  />  in  the  definition  of  the  Fuller  points,  ff,, 
determines  how  finely  we  have  sampled  all  spatial  di¬ 
rections.  Hence,  the  parameter  />  influences  the  re¬ 
sulting  field  patterns.  In  particular,  if  p  is  not  suf¬ 
ficiently  large  it  is  possible  for  the  optimal  DIBCF  field 
pattern  to  have  a  split  beam. 

We  illustrate  this  effect  by  systematically  varying  p 
in  the  array  of  example  2.  but  for  a  different  choice  of 
:)U  and  S.  Here,  we  define  Til  to  be  the  collection  of  all 
directions  whose  projection  on  the  xz  plane  lies  within 
±8°  of  the  z  axis.  Specifically,  the  direction  corres¬ 
ponding  to  direction  cosines  (a, P,y)  lies  in  3H  only  if 
|  a  ''(a*+  y*)',2\  e  sin8°.  In  other  words,  311  consists  of 
all  directions  contained  between  the  two  planes  inter¬ 
secting  the  vz  plane  at  the  angles  of  +  8°  and  -8°.  The 
sidelobe  region,  8,  consists  of  all  remaining  directions, 
so  there  is  no  ignored  region,  6.  Optimal  excitations 
for  several  choices  of  p  are  given  in  Table  VI.  The 
field  patterns  for  />  =  24  and  p  =  16  are  given  in  Fig.  12. 


209  J,  Acoust.  Soc.  Am.,  Vol.  69,  No.  1,  January  1981 


Roy  L.  Streit:  Discrete  arrays  of  arbitrary  geometry  209 


-53- 


We  do  not  present  the  field  patterns  for  p  =  32  and  p  =  40, 
because  they  are  so  similar  to  p  =  24. 

Table  VI  also  shows  the  effects  of  oversampling. 

Note  that  the  optimal  excitations  for  p  -  24  are  all  sim¬ 
ilar,  but  they  do  not  seem  to  be  converging  to  an  opti¬ 
mal  set.  This  is  probably  due  to  the  buildup  of  numeri¬ 
cal  roundoff  error  in  the  required  sums  [i.e.,  (25)  and 
(26) ),  but  it  could  also  be  that  p  must  be  chosen  even 
larger  than  40  before  the  optimal  excitations  give  the 
appearance  of  convergence.  In  any  event,  the  impor¬ 
tance  of  sampling  sufficiently  finely  is  clear,  but  evi¬ 
dently  oversampling  wastes  time  and  increases  the 
numerical  roundoff  error  in  the  computed  optimal  ex¬ 
citations. 

D.  Example  4:  Time  and  accuracy  in  the  indirect 
method 


TABLE  VI.  Effects  of  sampling  on  excitations  for  example  3. 


Element 

no. 

16 

P 

24 

32 

40 

i 

0.2953 

0.1275 

0.1322 

0.1177 

2 

-0.0064 

0.1676 

0.1792 

0.1662 

3 

0.3846 

0.2207 

0.2406 

0.2232 

4 

-0.1045 

0.2497 

0.2613 

0.2581 

5 

0.2990 

0.2884 

0.2946 

0.2934 

6 

-0.2830 

0.3067 

0.2991 

0.3099 

7 

0.1753 

0.3337 

0.3154 

0.3259 

8 

-0.3280 

0.3346 

0.3117 

0.3279 

9 

0.1753 

0.3337 

0.3154 

0.3259 

10 

-0.2830 

0.3067 

0.2991 

0.3099 

11 

0.2990 

0.2884 

0.2946 

0.2934 

12 

-0.1045 

0.2497 

0.2613 

0.2581 

13 

0.3846 

0.2207 

0.2406 

0.2232 

14 

-0.0064 

0.1676 

0.1792 

0  1662 

15 

0.2953 

0.1275 

0.1322 

0.1177 

Largest 

eigenvalue,  nj 

0.1149 

0.1243 

0.1165 

0.1225 

No.  of  Fuller 

2562 

5762 

10  242 

16  002 

points  Jp 

We  consider  a  line  array  of  25  elements  that  lies 
along  the  v  axis  with  equal  spacings  of  0.5  wavelength, 
where  the  wavelength  A  =  1.  Thus,  the  coordinates  of 
the  first  and  last  elements  are  (0.,0. ,0.)  and  (0.,12.,0.), 
respectively.  We  select  the  mainlobe  region,  3H,  to  be 
the  set  of  all  directions  that  lie  within  5°  of  a  normal  to 
the  y  axis,  and  we  define  S  to  be  the  set  of  all  other 
directions.  There  is  no  ignored  region,  0.  The  am¬ 
bient  noise  field  is  flat  and  the  individual  elements  are 
assumed  omnidirectional.  Finally,  we  select  the  Fuller 
points,  This  completely  defines  our  problem. 


It  is  clear  from  the  definition  of  the  indirect,  or 
group  coordinate  relaxation  method,  that  the  size  of 
the  subarrays  used  and  the  stopping  criteria  for  the 
iteration  procedure  both  have  significant  effects  on  nu¬ 
merical  accuracy  of  the  computed  excitations  and  on 
the  time  required  to  compute  them.  The  following  ex¬ 
ample  illustrates  how  numerical  accuracy  and  compu¬ 
tation  time  depend  on  both  these  parameters. 


Table  VII  shows  the  number  of  iterations  required  for 
various  choices  of  subarray  size,  n„  and  stopping  cri¬ 
terion,  EPSI,  defined  by 

L  old  eigenvalue  estimate  I  , 

I  new  eigenvalue  estimate  I  ' 


FIG.  12.  Field  pattern  for  example  3  , 


210  J.  Acouit.  Soc.  Am.,  Vol.  69,  No.  1,  J»nu«ry  1981 


Roy  L.  Streit.  Discrete  arrays  of  arbitrary  geometry  210 


-54- 


TABLE  VII.  Number  of  iterations  required  in  example  4. 


10"3 

EPSI 

10’4 

10' 5 

Time  per 
iteration  (s) 

5 

9 

13 

24 

83 

10 

5 

10 

13 

95 

15 

3 

4 

6 

115 

20 

3 

3 

3 

140 

25 

1 

1 

1 

172 

As  can  be  expected,  the  number  of  iterations  required 
increases  with  decreasing  ESPI  and  decreases  with 
increasing  nt.  Also  the  computation  time  per  iteration 
increases  with  ns. 

An  important  concern  is  the  numerical  accuracy  of 
the  computed  excitations.  This  is  particularly  impor¬ 
tant  in  light  of  the  fact  that  numerical  computation  of 
eigenvectors  by  any  method  is  less  stable  than  the  nu¬ 
merical  computation  of  eigenvalues.  Table  VIII  shows 
the  results  obtained  for  m,  =  5  by  stopping  after  the 
first  four  complete  passes  through  the  array,  i.e.,  for 
iterations  5,  10,  15,  and  20,  respectively.  The  exact 
results  are  included,  also.  The  field  patterns  corres¬ 
ponding  to  excitations  of  iteration  5  and  the  exact  exci¬ 
tations  are  shown  in  Fig.  13.  Note  that,  at  the  end  of 
iteration  5,  the  field  pattern  already  possesses  side- 
lobes  in  the  correct  positions  although  they  are  about 
3  dB  higher  than  in  the  field  pattern  of  the  exact  exci¬ 
tations.  Thus,  the  effect  of  later  iterations  is  to  beat 
down  the  sidelobes  while  maintaining  the  mainlobe 
beamwidth. 


IV.  SUMMARY 

The  concept  of  Directivity  Index  with  Beamwidth  Con¬ 
trol  (DIBC)  has  been  defined  as  the  ratio  of  power  in 
the  mainlobe  region  to  the  total  power  in  both  the  main- 
lobe  and  the  sidelobe  regions.  A  mathematically  and 
numerically  tractable  method  for  the  computation  of 
optimum  element  excitations  (i.e.,  excitations  that 
maximize  DIBC)  is  presented.  A  technique  known  as 
group  coordinate  relaxation  is  shown  to  be  an  effective 


TABLE  VIII.  Example  4  with  subarrays  of  five  elements. 


Element 

no. 

Iteration 

5 

Iteration 

10 

Iteration 

15 

Iteration 

20 

Exact 

1 

0.632 

0.410 

0.623 

0.793 

1.000 

2 

0.813 

0.622 

0.887 

1.090 

1.339 

3 

0.993 

0.860 

1.173 

1.406 

1.692 

4 

1.183 

1.117 

1.428 

1.736 

2.057 

5 

1.394 

1.398 

1.800 

2.082 

2.436 

6 

1.446 

1.788 

2.218 

2.461 

2.792 

7 

1.648 

2.079 

2.539 

2.795 

3.145 

8 

1.840 

2.355 

2.832 

3.095 

3.456 

9 

2.023 

2.611 

3.100 

3.366 

3.732 

10 

2.188 

2.842 

3.328 

3.592 

3.955 

11 

2.541 

3.143 

3.522 

3.783 

4.120 

12 

2.664 

3.298 

3.654 

3.904 

4.223 

13 

2.749 

3.392 

3.722 

3.956 

4.254 

14 

2.806 

3.439 

3.737 

3.951 

4.223 

15 

2.809 

3.421 

3.677 

3.877 

4.120 

16 

3.011 

3.271 

3.559 

3.729 

3.955 

17 

2.951 

3.145 

3.396 

3.541 

3.732 

18 

2.834 

2.967 

3.174 

3.298 

3.456 

19 

2.696 

2.756 

2.929 

3.022 

3.145 

20 

2.489 

2.494 

2.631 

2.700 

2.792 

21 

2.012 

2.190 

2.290 

2.353 

2.436 

22 

1.762 

1.887 

1.957 

2.000 

2.057 

23 

1.511 

1.567 

1.632 

1.658 

1.692 

24 

1.260 

1.294 

1.313 

1.324 

1.339 

25 

1.000 

1.000 

1.000 

1.000 

1.000 

^  man 

0.979  831  3 

0.9857357 

0.988  00  1  9 

0.988  699  8 

0.988  9890 

211  J.  Acouft-  Soc.  Am.,  Vol.  69,  No.  1,  January  1981 


Roy  L.  Strait:  Diicrete  arraya  of  arbitrary  geometry  21 1 


-55- 


means  of  computing  optimum  element  excitations  for 
arrays  of  arbitrary  numbers  of  elements,  yet  it  re¬ 
quires  only  nominal  core  storage.  Conceptually,  the 
group  coordinate  relaxation  technique  employs  subar¬ 
rays  of  the  full  array  in  a  systematic  manner  to  opti¬ 
mize  excitations  of  the  full  array.  Four  examples  have 
been  included,  one  of  which  demonstrates  the  effec¬ 
tiveness  of  group  coordinate  relaxation  for  a  cylindrical 
array  of  105  elements. 

ACKNOWLEDGMENTS 

The  author  would  like  to  thank  Mr.  Barry  G.  Buehler 
and  Dr.  Albert  H.  Nuttall,  both  of  the  New  London  Lab¬ 
oratory,  Naval  Underwater  Systems  Center,  for  their 
helpful  comments  on  the  various  drafts  of  this  article. 
Mr.  Buehler  also  supplied  the  author  with  many  exam¬ 
ples  using  the  approach  of  this  article,  one  of  which 
is  included  here  as  example  1. 


*J.  K.  Butler  and  H.  Unz,  "Beam  efficiency  and  gain  optimiza¬ 
tion  of  antenna  arrays  with  nonuniform  spacings,"  Radio  Sc i. 
2  (7)  (new  series),  711-720  (July  19G7). 

!J.  K.  Butler  and  H.  Unz,  "Optimization  of  beam  efficiency 
and  synthesis  of  nonuniformly  spaced  arrays,”  Proc.  IEEE 
(Letters)  54  ,  2007  -2008  (December  I960). 


3R.  L.  Streit,  Array  Optimization  Using  Subarrays,  NUSC 
Technical  Report  5889  (Naval  Underwater  Systems  Center, 
New  London,  CT,  23  March  1979). 

*F.  R.  Gantmacher,  The  Theory  of  Matrices  (Chelsea,  New 
York,  I960). 

*The  IMSL  Library,  Volume  2,  International  Mathematical  and 
Statistical  Libraries  (IMSL,  Inc.,  Houston,  TX,  1977),  6th 
ed. 

*R.  S.  Martin  and  J.  H.  Wilkinson,  “Reduction  of  the  symmet¬ 
ric  eigenproblem  Ax  -  hBx  and  related  problems  to  stan¬ 
dard  form,"  Numerische  Mathematik  11,  99-110  (1968). 

7D.  Slepian  and  H.  O.  Poliak,  "Prolate  spheroidal  wave  func¬ 
tions,  Fourier  analysis  and  uncertainty —  I,"  Bell  Syst.  Tech. 
J.  40,  43-63  (1961). 

*D.  K.  Faddeev  and  V.  N.  Faddeeva,  Computational  Methods  of 
Linear  Algebra  (Freeman,  San  Francisco,  1963). 

ST.  W.  Melnyk,  O.  Knop,  and  W.  R.  Smith,  "External  arrange¬ 
ments  of  points  and  unit  charges  on  a  sphere:  Equilibrium 
configurations  revisited,”  Can.  J.  Chem.  55,  1745-1761 
(1977). 

,4J.  Prenis,  "An  Introduction  to  Domes,”  in  The  Dome  Build¬ 
er's  Handbook  (Running  Press,  Philadelphia,  1973). 

**D.  Lee,  Maximization  of  Reverberation  Index,  NUSC  Techni¬ 
cal  Report  5375  (Naval  Underwater  Systems  Center,  New 
London,  CT,  2  October  1976). 

**D.  Lee  and  G.  A.  Leibiger,  Computation  of  Beam  Patterns 
and  Directivity  Indices  for  Three-Dimensional  Arrays  with 
Arbitrary  Element  Spacings,  NUSC  Technical  Report  4687 
(Naval  Underwater  Systems  Center,  New  London,  CT,  22 
February  1974). 


212  J.  Acoust.  Soc.  Am.,  Vo(.  69.  No.  1,  January  1981 


Roy  L.  Strait:  Diicrete  arrays  of  arbitrary  geometry  212 


-56- 


The  Effect  Of  Interchannel  Crosstalk 
On  Array  Performance 

R.  L.  Streit 


-57- 


The  effect  of  interchannel  crosstalk  on  array  performance 

RoyL.  Streit 

Naval  Underwater  Systems  Center,  New  London,  Connecticut  06320 
(Received  5  May  1987;  accepted  for  publication  16July  1989) 

It  is  shown  that  interchannel  crosstalk  can  always  be  eliminated  before  the  channel  signals 
enter  the  beamformer,  provided  crosstalk  levels  do  not  exceed  a  maximum  permissible  upper 
bound  and  are  known  exactly.  The  crosstalk  upper  bound  (in  decibels)  is  shown  to  be  -  20 
log  N,  where  N  is  the  number  of  channels  in  the  array.  The  beam  pattern  of  a  general  array 
with  arbitrary  crosstalk  levels,  steered  in  any  direction,  is  derived.  Sample  beam  patterns  are 
presented  for  arrays  of  50  and  100  elements.  Expected,  or  average,  beam  patterns  are  derived 
for  interchannel  crosstalk  coefficients  modeled  as  statistically  independent  random  variables.  It 
is  shown  that  pointing  error  can  occur  when  crosstalk  has  nonzero  mean.  It  is  also  shown  how 
to  correct  for  nonzero  mean  crosstalk  before  the  signals  enter  the  beamformer,  provided  these 
means  are  known. 

PACS  numbers:  43.60.Gk,  43.30. Yj,  43.88.Hz 


INTRODUCTION 

Interchannel  crosstalk  is  modeled  throughout  this  pa- 
per  as  a  multiple-input-multiple-output  linear  system.  The 
inputs  to  this  linear  system  are  the  array's  sensor  outputs, 
while  the  system  outputs  are  the  inputs  to  the  array  beam- 
former.  In  Secs.  I  and  II  it  is  assumed  that  the  transfer  func¬ 
tion  matrix  H  of  the  linear  crosstalk  system  is  known  exact¬ 
ly.  The  discussion  in  these  two  sections  is  simplified  by 
presenting  explicitly  only  the  idealized  case  when  crosstalk 
is  both  instantaneous  and  frequency  independent;  however, 
this  simplified  presentation  in  no  way  is  a  restriction  on  the 
methods  and  results  presented  in  these  sections.  A  very  dif¬ 
ferent  crosstalk  model  is  assumed  for  Sec.  III.  In  this  section 
the  components  of  the  crosstalk  transfer  function  matrix  H 
are  treated  as  random  variables.  Such  a  model  is  employed 
not  to  suggest  that  crosstalk  is  truly  a  random  system,  but 
rather  to  gain  some  rudimentary  insight  into  the  conse¬ 
quences  of  the  lack  of  exact  knowledge  of  the  crosstalk  t  rans- 
fer  function  matrix  H.  Whether  or  not  such  a  model  is  satis¬ 
factory  for  the  intended  purpose  depends  on  the  particular 
application. 

Section  I  of  this  paper  shows  that  if  the  interchannel 
crosstalk  levels  do  not  exceed  XB  =  -  20  log  N  (in  deci¬ 
bels),  then  crosstalk  can  always  be  eliminated  before  the 
channel  signals  enter  the  beamformer.  The  crosstalk  bound 
XB  should  be  interpreted  as  a  theoretical  worst  case  upper 
bound  on  crosstalk  levels.  This  bound  is  especially  important 
if  adaptive  beamforming  is  undertaken;  that  is,  given  statisti¬ 
cally  independent  array  sensor  outputs,  statistical  indepen¬ 
dence  of  the  beamformer  input  signals  can  be  guaranteed  if 
the  crosstalk  bound  XB  is  satisfied. 

Section  II  of  this  paper  shows  that,  for  a  general  array, 
crosstalk  is  theoretically  equivalent  to  a  channel  shading 
perturbation.  An  explicit  expression  for  the  beam  pattern  of 
a  general  array,  arbitrarily  steered,  with  crosstalk  is  given. 
As  will  be  seen,  satisfying  the  crosstalk  bound  XB  does  not 
necessarily  mean  that  sidelobe  levels  are  not  degraded.  In 
addition,  crosstalk  almost  always  causes  pointing  error;  that 
is,  the  maximum  response  of  the  beam  pattern  need  not  oc- 

1827  J  Acoust  Soc  Am  86  (5).  November  1989 


cur  in  the  steered  direction.  Pointing  error  thus  contributes 
to  target  bearing  estimation  error.  Examples  indicate,  how¬ 
ever,  that  pointing  error  is  probably  not  significant  if  the 
bound  XB  is  satisfied. 

Section  III  of  this  paper  derives  the  expected,  or  aver¬ 
age.  beam  pattern  of  an  array  with  the  individual  crosstalk 
coefficients  (i.e..  the  components  of  the  transfer  function 
matrix  H)  modeled  as  statistically  independent  random 
variables.  It  is  shown  that  pointing  error  cannot  occur  in  the 
expected  beam  pattern,  provided  the  crosstalk  between  dis¬ 
tinct  pairs  of  channels  is  zero  mean.  Pointing  error  can  occur 
only  when  crosstalk  is  nonzero  mean.  It  is  shown  how  to 
correct  for  nonzero  mean  crosstalk  before  the  signals  enter 
the  beamformer. 

Section  IV  gives  several  example  beam  patterns,  and 
Sec.  V  briefly  recapitulates  the  paper's  conclusions.  Appen¬ 
dix  A  states  Gershgorin's  theorem  that  is  used  to  derive  the 
crosstalk  bound  XB.  Finally,  using  the  crosstalk  model  of 
Sec.  Ill,  Appendix  B  derives  the  maximum  crosstalk  vari¬ 
ance  allowed  for  a  specified  increase  in  the  beam  pattern 
sidelobe  level. 


I.  CROSSTALK  BOUND 

Let  Vn  (0  denote  the  output  voltage  signal  of  the  nth 
sensor  of  an  N  channel  array.  Let  l/„  (/)  denote  this  signal, 
contaminated  by  crosstalk,  at  the  input  to  the  beamformer. 
Ideally,  U„(t)  =  Vn  ( t )  when  the  effect  of  crosstalk  between 
channels  is  negligible.  If  crosstalk  cannot  be  ignored,  then 
{ t/„ ( / ) }  is  related  to  the  voltage  signals  K, (/),.. .,Fw(r)  by 
the  linear  relationships 

Ujt)  =  VJt)  +  £  Hnkyk(t),  fori  (1) 

C?  i 

*  *n 

where  }  are  real  constants,  independent  of  both  time  t 
and  the  signals  (Fn(/)}.  For  convenience,  we  define 
Hkk  =  1  for  all  k.  Then,  for  all  n  and  k,  H„k  Vk  (/)  is  the 
contribution  to  the  nth  beamformer  input  from  the  k  th  sen- 

1827 


-59- 


sot  oul  pul.  The  units  of  the  beamformer  inputs  arc  not  staled 
in  ( I )  It  is  convenient  to  suppose  that  V„  ( / )  is  a  voltage,  in 
which  case  the  coefficients  {//„,  }  are  dimensionless. 

Model  ( 1 )  assumes  that  no  time  delay  exists  between 
the  sensor  outputs  and  their  communication  to  the  beam- 
former.  This  assumption  is  very  reasonable  in  practice,  and  it 
has  the  important  consequence  that  the  crosstalk  coeffi¬ 
cients  {H„k }  are  all  real.  If  some  application  requires  non¬ 
zero  time  delays,  then  it  is  more  appropriate  to  develop  a 
model  in  the  frequency  domain.  In  that  case,  the  coefficients 
{H„k}  are  complex  and,  possibly,  frequency  dependent. 
None  of  the  results  developed  in  this  section  depend  on 
{H„k  }  being  real  constants;  hence,  they  apply  on  a  frequen¬ 
cy-by-frequency  basis  in  the  frequency  domain  when  cross¬ 
talk  is  modeled  as  a  multiple-input-multiple-output  linear 
system. 

Model  ( I )  does  not  require  that  the  system  which  com¬ 
municates  the  sensor  outputs  to  the  beamformer  conserve 
power.  It  also  does  not  require  reciprocity;  that  is.  //„,  need 
not  equal  Hk„  However,  the  model  does  assume  that  the 
sensors  are  properly  calibrated  and  have  the  same  gain. 

Complete  crosstalk  is  defined  to  be  the  special  case 
where  the  output  voltage  of  each  sensor  makes  equal  contri¬ 
butions  to  every  beamformer  input  channel.  Thus,  when 
complete  crosstalk  occurs,  all  the  coefficients  //„,  are  equal 
to  1.  From  ( 1 )  it  follows  that 

(/„(f)  =  £  Vk  (/),  all  n. 

k  \ 

In  other  words,  regardless  of  the  nature  of  the  sensor  out¬ 
puts,  all  beamformer  input  channels  are  identical.  Complete 
crosstalk  is  highly  undesirable. 

In  matrix  form,  system  ( I )  can  be  written 

U(t)  =  (2) 

where 


(/-(f) 

"  ^,(f) ' 

(/(f)  = 

(/-(f) 

.  F(t)  = 

K(f) 

.l/v<f). 

,^'<f>_ 

1 

W, 

1 

hX! 

1 

If  the  matrix  H  is  invertible,  then  we  have 

V(:)  =  H  'VU)-  (3) 

Therefore,  crosstalk  can  be  eliminated  before  the  signals  en¬ 
ter  the  beamformer  by  using  ( 3 )  to  recover  the  sensor  output 
voltages  if  H  is  known  and  the  sensors  are  properly  calibra¬ 
ted  The  requirement  that  II  '  exists  is  a  critical  assump¬ 
tion.  In  the  case  of  complete  crosstalk,  for  example,  the  en¬ 
tries  of  H  are  identically  1,  and  so  If  1  docs  not  exist.  It  is 
important  to  emphasize  that  eliminating  crosstalk  using  ( 3 ) 
requires  a  thorough  understanding  of  the  crosstalk  mecha¬ 
nism  because  every  entry  of  the  matrix  H  must  be  known 
with  considerable  accuracy  to  evaluate  the  inverse  of  H  accu¬ 
rately. 

1826  J  Acoust  Soc  Am  .  Vol  86.  No  5.  NovembeM989 


In  practice,  efforts  arc  usually  made  to  minimize  cross¬ 
talk  by  appropriate  engineering  means.  We  therefore  assume 
that  the  level  of  crosstalk  between  any  pair  of  channels  ts 
bounded;  that  is. 

\H„k  U.f,  all  k  5 tn,  (4) 

for  some  constant  f.  As  the  complete  crosstalk  example 
shows,  linear  independence  of  the  beamformer  inputs,  given 
linearly  independent  sensor  outputs,  is  a  very  important  con¬ 
sideration  to  keep  in  mind  (especially  in  adaptive  beamform¬ 
ing)  when  specifying  the  size  off.  We  now  derive  a  theoreti¬ 
cal  upper  bound,  denoted  XB,  which  guarantees  linear 
independence  of  the  beamformer  inputs  whenever  crosstalk 
levels  fall  below  XB.  Equivalently,  wederiveacrosstalk  level 
XB,  which  guarantees  that  the  inverse  of  H  exists  no  matter 
what  the  actual  crosstalk  coefficients  H„L  are.  so  long  as  they 
are  not  larger  in  magnitude  than  XB. 

Gershgorin's  theorem  applied  to  H  (see  Appendix  A) 
together  with  the  inequalities  (4)  imply  that  all  the  eigenval¬ 
ues  /  of  H  satisfy  the  inequality 

M  -  lk(A'—  l)f.  (5) 

Since  H  is  invertible  if  and  only  if  H  has  no  zero  eigenvalues, 
it  follows  that  H  is  certainly  invertible  if 

(A  -  I  If  <  I, 
or 

f  <f .  =  1/(A-  I).  (6) 

This  is  a  sufficient,  but  not  necessary,  condition  for  the  exis¬ 
tence  of  H  '.  Inequality  (6)  is  the  crosstalk  upper  bound 
mentioned  above.  The  beamformer  inputs  {{/„(/)}  are  lin¬ 
early  independent  if  and  only  if  the  matrix  H  is  invertible 
(and.  of  course,  the  sensor  outputs  {F„(t)}  are  linearly  in¬ 
dependent).  Taking  20log(f„,JV )  gives  —  20  log(.V-  1) 
=  -  2Q  log  A'  =  XB  as  the  maximum  allowed  crosstalk  lev¬ 
el  (in  decibels)  between  any  two  channels. 

In  some  applications,  the  crosstalk  matrix  H  may  have 
special  structure,  e.g.,  H  may  be  banded,  or  block  diagonal, 
or  sparse,  etc.  If  such  structure  is  present,  it  may  well  mean 
that  a  less  restrictive  crosstalk  bound  than  the  bound  XB 
derived  above  is  appropriate  for  the  given  array.  Incorpora¬ 
tion  of  any  such  special  structure  for  H  into  the  above  deriva¬ 
tion  of  XB  is  straightforward  because  of  the  generality  of 
Gershgorin's  theorem;  however,  we  do  not  pursue  this  issue 
further  here  because  of  the  many  different  possible  struc¬ 
tures  for  H. 

Define  E  =  H  —  /.  where  /  is  the  identify  matrix.  Thus 
E  has  a  zero  diagonal,  but  is  otherwise  identical  to  H.  Gersh¬ 
gorin's  theorem  and  assumptions  (A)  and  (6)  guarantee  that 
all  the  eigenvalues  of  E  lie  inside  the  unit  circle  in  the  com¬ 
plex  plane.  Since  H  —  J  +  E  and  the  eigenvalues  of  E  are  less 
than  J  in  magnitude,  we  can  write 

H  '  =  </+£)  '  =  V  (  -  1  )"E" 

=  (?) 

(8) 

Substituting  (8)  into  (3)  gives  the  potentially  very  useful 
approximation 

Royl  Streit  Interchannel  crosstalk  and  array  performance  1828 


-60- 


(11) 


>(/)  -  </-£)l/(/).  (0) 

Eliminanng  crosstalk  using  (9)  is  much  more  computation¬ 
ally  efficient  than  using  (3).  The  drawback  is  that  (9)  is  only 
an  approximation,  whereas  (3)  holds  exactly. 

Using  standard  results  concerning  matrices,  (9)  gives 
the  following  estimate  for  the  total  squared  difference  be¬ 
tween  the  true  sensor  output  vector  F(r)  and  the  crosstalk 
contaminated  beamformer  input  vector  U(t). 

t  i  i«„.r  X  uiu). 

n  I  nil  m  I 

n  •  k 

This  estimate  is  useful  only  when  most  of  the  crosstalk  coef¬ 
ficients  are  zero. 

When  is  the  calculation  of  H  1  numerically  accurate?  If 
the  computer  uses  T>  1  significant  digits,  then  a  sufficient 
condition  for  numerical  stability  is  that  the  “condition  num¬ 
ber"  of  H  not  exceed  10' .  The  condition  number  of  H  is 
defined  to  be  the  ratio  of  its  largest  to  its  smallest  singular 
value.  Assuming  that  H isdiagonalizable  (a  not  very  restric¬ 
tive  assumption  in  this  application ),  then  the  singular  values 
of  H  are  precisely  the  eigenvalues  of  H  in  absolute  value 
(Ref.  1.  4.12.3).  Since  the  eigenvalues  of  H  satisfy  the  in¬ 
equality  (S).  the  singular  values  lie  in  the  closed  interval 
1 1  —  (.V  —  1  )t.l  +  (.V—  l)f)  To  ensure  numerical  stabil¬ 
ity  of  H  ’.we  require 

1  <  ip' . 

1  -  (A  -  l)f 
Solving  for  f  gives 


This  is  the  same  condition  (6)  which  guaranteed  the  exis¬ 
tence  of  H  Thus  numerical  inversion  of  H  is  reliable  if 
f  <  that  is,  if  the  crosstalk  bound  XB  is  satisfied. 

It  is  still  necessary,  however,  to  estimate  the  effect  of 
crosstalk  on  the  beam  patterns.  It  is  possible  for  the  crosstalk 
bound  XB  to  be  satisfied  and  still  have  poor  beam  patterns. 
Satisfying  the  bound  XB  does  not  guarantee  satisfactory 
beam  patterns,  as  will  be  seen;  it  only  guarantees  that  H  ~  ' 
exists. 

II.  BEAM  PATTERNS 

The  beamformer  output  is  the  weighted  delayed  sum  of 
the  channel  outputs 

S<0  =  X  a.V.V  -r„>, 

n  1 

where  {a.}  are  the  channel  weights,  and  {r. }  are  time  de¬ 
lays  corresponding  to  a  particular  steered  beam  direction. 
Substituting  the  crosstalk  effects  ( 1 )  gives 

Bt,l)  =  X  ""  I  (I  -  r„  ) 

n  I  k  I 

=  x  X  °«//..*  ^( (io) 

l  I  n  I 

For  a  line  array  steered  broadside,  r„  =  0  for  all  n.  For  this 
particular  case,  the  summation  on  n  can  be  done  separately, 
giving 


fl</>  =  X  M  *(M  . 

I  I 

where 


b, 

b; 

l>s 


1  Hst 
H 1 

L«,v 


Hsx" 


=  H’a.  (12) 


l  J  Las  J 


Thus  the  broadside  beam  pattern  of  a  line  array  with  cross¬ 
talk  is  an  ordinary  beam  pattern  with  weights  {b„ }  which 
are  perturbations  of  the  original  weights  {o„ }. 

In  general,  however,  the  time  delays  {r„ }  are  not  zero. 
Nonetheless,  a  similar  result  can  be  shown  to  hold.  The  time 
delay  used  for  a  channel  located  at  position 
p„  =  (*„,>>„,  2„ ) ,  when  steered  to  form  a  beam  in  the  direc¬ 
tion  of  the  unit  vector  u.  is 

r„  =  (P„’U)/c .  (13) 


where  c  is  the  wave  propagation  velocity.  To  find  the  beam 
pattern  of  this  steered  beam,  we  proceed  as  follows.  Suppose 
a  unit  amplitude  plane  wave  of  radian  frequency  to  is  propa¬ 
gating  from  direction  v.  Note  that  v  is  unrelated  to  the 
steered  direction  u.  Then, 


yt  (/)  =  exp[/Vu(/  +  fu  )  ].  all  k.  (14) 

where  the  time  delay  ftL  is  given  by 

Bt  =  (P»  •»)/<■  (15) 

The  time  required  for  a  plane  wave  to  propagate  from  sensor 
k  to sensoryis  therefore  ,fjk  —  ft,  .  Becauseof  ( 14).  wehave 

yt  ('  -  rn  )  =  exp [ fit) (f  +  //*-»•„)].  ( 16) 

Substituting  (16)  into  (10)  and  taking  the  magnitude 
squared  of  both  sides  gives  the  squared  amplitude  of  the 
steered  beamformed  output: 

l*U)l:=  |  X  X  a-H^  exp[i<u(/it  -  t„)]  I 

The  right-hand  side  of  this  equation  is  independent  of  time, 
but  it  does  depend  on  the  plane-wave  direction  v  via  (15). 
Because  of  the  assumption  of  a  unit  amplitude  plane  wave, 
this  function  is  the  directional  beam  pattern,  denoted  here 
F(v).  Consequently. 

I  *  ■'  1 3 

F(y)  =  X  exP Ucofik )  X  exP<  ~  »a>r« ) 

=  I  X  c‘  exP(,ftJ/J»)|  •  (17) 

I  tf  i  I 

where 


c, 

■  1  H,_,  //>,' 

’  a i  exp(  —  loir,) 

c2 

= 

H, ,  1  H%  j 

a,  exp(  —  loir,) 

. 

ff.v  Ht.s  1  . 

ax  exp(  —  io>rv ) 

(18) 


In  general,  then,  crosstalk  in  a  steered  array  is  equivalent  to  a 
perturbation  of  the  original  weights  {a„ }  after  they  have 
been  phase  shifted  to  steer  a  beam.  Note  that  the  matrix  in 
( 1 8 )  is  the  transpose  of  H. 

The  perturbation  (18)  involves  both  magnitude  and 


1829  J  Acoust  Soc  Am .  Vol  86.  No  5.  November  1989 


Roy  L.  Streit:  Interchannel  crosstalk  and  array  performance  1829 


-61- 


V 


phase  effects,  so  that  pointing  error  can  occur  when  the  time 
delays  r.  are  not  all  zero.  Thus  considering  the  special  cases 
of  line  and  planar  arrays,  pointing  error  can  arise  in  any  of 
the  nonbroadside  steered  beams.  In  general  nonplanar  ar¬ 
rays,  however,  pointing  error  probably  occurs  in  every 
steered  beam.  Examples  indicate  that  pointing  error  is  not 
significant  when  the  crosstalk  bound  XB  is  satisfied;  none¬ 
theless,  the  only  way  to  be  certain  in  any  particular  array  is  to 
calculate  the  pointing  error  directly  using  (17)  and  (18). 

Note  that  (18)  reduces  to  ( 12)  if  all  the  r„'s  are  zero. 
Note  also  that  (18)  reduces  to  the  usual  steered  beam  pat¬ 
tern  if  crosstalk  is  negligible. 

In  the  previous  section  it  was  pointed  out  that  crosstalk 
can  be  eliminated  before  the  signals  enter  the  beamformer, 
provided  the  coefficients  {H,k  }  are  precisely  known.  How¬ 
ever,  if  the  corrupted  signals  have  already  entered  the  beam 
former,  then  crosstalk  can  still  be  eliminated  by  using  (18) 
to  properly  adjust  the  beamformer  weights.  The  use  of  beam- 
former  weights  w  =  D~'H  ~ 'Da  will  result  in  the  desired 
weighting  Da,  where  D  is  the  diagonal  matrix  implicit  in 
(18).  Obviously,  this  is  much  less  efficient  than  correcting 
for  crosstalk  before  beamforming  because  u>  depends  on  the 
steered  direction  u. 

III.  EXPECTED  BEAM  PATTERNS 

The  crosstalk  coefficients  in  ( 1 )  are  all  real;  however,  as 
was  pointed  out  previously,  they  are  complex  in  the  frequen¬ 
cy  domain  if  time  delays  exist  in  the  crosstalk  mechanism. 
Because  of  this  possibility,  and  because  greater  generality 
causes  no  extra  difficulty,  in  this  section  we  model  each 
crosstalk  coefficient  H„k  as  a  complex  random  variable  with 
mean  H„k .  Clearly,  Hkk  =  1  for  all  k.  The  time  delays  {r„  } 
and  the  designed  channel  weights  {a,  }  are  known  and  fixed; 
so  taking  the  mean  in  ( 18)  gives 

p,"|  rj  H 2'  rfl|exp( 

\cA  1  «»!  «!exp(-wr,) 


L#i*  #2*  i  J  La*  exp<  -  ,<urv ) J 

(19) 

for  the  mean  values  {c„  }  of  the  perturbed  weights.  It  is  im¬ 
mediately  evident  from  (19)  that  if  the  crosstalk  is  zero 
mean,  that  is,  Hnk  =  0  for  k  ^  «,  then  the  perturbed  weights 
have  mean  values  equal  to  the  original  weights  phase  shifted 
to  steer  a  beam  in  the  direction  u. 

Equation  ( 18)  can  be  used  to  compute  the  mean  value  of 
the  beam  pattern  given  a  probabilistic  model  of  the  crosstalk. 
The  simplest  model  is  to  suppose  that  the  random  variables 
{H„k  K  „„  are  statistically  independent,  that  is, 

E[H„kH%,  )=0,  (20) 

whenever  (n.k)*(m,j)  and  k  ^nandy^m.  In  (20),  *  de¬ 
notes  complex  conjugates.  It  is  not  assumed  that  the  cross¬ 
talk  coefficients  are  identically  distributed;  so  we  define  the 
notation 

E\H„k  H*k]  =oik.  for  all  k.n  .  (21) 

In  particular,  okk  =  1  because  Hkk  —  1  identically.  For  con¬ 
venience,  let  w  =  ),  where 


wt  —  ak  exp(  —  itoTk )  .  all  k. 

Then.  (18)  can  be  rewritten 

s 

ck  =  u\  +  £  H„k  u’„  . 

ft  I 

»»  *  A 

Using  (22),  (20),  and  (21 ).  we  can  write 


=  u>,<  +  wk  £  HZ,wZ+w?  £  H,ku>„ 


+  22  E[H.kHt,]w,uC, 


=  wkuf  +  Wk  (Cj  —  Ul,  )•  -t-  uf(ck  -  U)k  ) 

+  i  E[H.kH%  ]w.ut 

it  =  I 
m*  {Aj} 

N 

=  U)4f *  +  ufck  -  ufwk  +  Sk)  £  ,  (24) 

ft  —  1 

A 

where  in  the  last  equation  Sk/  is  Kronecker's  delta.  For  con¬ 
venience  we  define  for  any  weighting  vector  s  =  ), 

x 

G,(v)  =  £  Si  exp(ta*i4)  .  (25) 

i  ■  i 

The  directional  beam  pattern  with  crosstalk  is  therefore, 
from  (23),  |Gr(v)|!,  while  the  directional  beam  pattern 
without  crosstalk  is.  from  (22),  |G„,(v)|2.  The  expected 
beam  pattern  is,  from  ( 17), 

£(E(v)]  =  £  2  £[c»c/*]exP[,0'(fii  -#*,)] 

A  *  I  /  *  I 

Substituting  (24)  and  using  the  notation  (25)  gives 


£(F0>))=  £  Y(wkc?  +  w?ck-w?wk) 

k  m  |  J  =  | 

N  S 

Xexp[nu(/2i -/«,))  +  £  £  oik ur.urj 

k  -  I  A  -  I 

*•#  k 

=  G.  (v)G?(v)  +  Gf  (v)GJ(v)  -  |G„.(v)|2 


Using  the  definition  (25),  note  that 

s 

Gf  (v)  =  £  c,  expU'upi ) 

A  =  I 

.V  N 

=  2  Wl  Mpt**#* )  +  2  _  w*  >e*P U<ofik ) 

A  -  I  A  =  | 

=  G..(v)  +  Gp_„.(v) 

Substituting  this  identity  into  (26)  gives  the  result 


1830  J  Acoust  Soc  Am .  Vol  86.  No.  5.  November  1989 


Boy  L.  Strert:  Interchamel  crosstalk  and  array  perlormance  1830 


-62- 


£  (£(v)  1  =  |G„  ( v )  |  -  +  2  Re[G„  (v)Cf  „  (v)  J 


It  is  important  to  notice  that  the  only  way  the  middle  term  on 
the  right-hand  side  of  ( 27 )  can  be  identically  zero  is  if?  =  id. 
As  was  pointed  out  above,  ?  =  id  if  and  only  if  the  crosstalk  is 
zero  mean,  that  is,  H„k  =  0  for  k  ^n.  Since  crosstalk  is  not 
zero  mean  in  general,  (27)  cannot  be  further  simplified. 

The  other  two  terms  in  (27)  also  have  important  inter¬ 
pretations.  The  first  term  on  the  right-hand  side  of  (27)  is 
the  directional  beam  pattern  free  of  crosstalk.  The  third  term 
in  (27)  is  a  positive  constant  independent  of  u  and  the 
steered  direction  v,  as  can  be  seen  from  the  definition  ( 22 )  of 
id.  Thus,  with  zero  mean  crosstalk,  the  middle  term  of  (27) 
is  zero;  so  the  expected  beam  pattern  cannot  have  nulls.  With 
nonzero  mean  crosstalk,  it  is  possible,  though  unlikely,  that 
the  middle  term  may  be  sufficiently  negative  so  that 
£[£(v)]  has  nulls.  In  no  case,  however,  can  £(£(v)]  be 
negative. 

Equation  ( 27 )  can  be  used  to  derive  an  upper  bound  on 
the  maximum  crosstalk  variance  to  guarantee  that  the  side- 
lobe  level  of  the  expected  beam  pattern  does  not  increase 
more  than  a  specified  amount.  See  Appendix  B  for  the  case  of 
zero  mean  identically  distributed  flnk . 

It  is  also  clear  from  ( 27 )  that  the  expected  beam  pattern 
cannot  have  pointing  error  when  the  crosstalk  is  zero  mean. 
In  other  words,  the  expected  maximum  response  occurs 
when  v  equals  the  steered  direction  u,  or 

max  £[£(v)|  =  £|£(u)|  ,  (28) 

with  zero  mean  crosstalk.  Pointing  errors  can  only  arise  in 
the  expected  beam  pattern  when  the  crosstalk  is  not  zero 
mean. 

It  is  desirable  to  correct  for  nonzero  mean  crosstalk  lev¬ 
els  before  the  signals  enter  the  beamformer.  Taking  the  mean 
in  (4)  gives 

\H,k\<.(- 

Requiring  the  crosstalk  upper  bound  (6)  to  hold  implies 
that  H  ~ '  exists.  Thus  we  can  write 

Q(l)  =  //-'  U(t)  .  (29) 

Using  the  vector  Q(t )  as  the  beamformer  input  vector  re¬ 
sults  in  beamformer  input  channels  with  zero  mean  cross¬ 
talk.  The  reason  is  that  the  correction  (29)  modifies  the 
original  model  (2),  which  becomes  instead 

Q(t)  =  H-'HV(t)=H^V(t) .  (30) 

Clearly,  the  effective  crosstalk  matrix  H is  such  that 
H'„  =  /.  In  other  words,  has  entries  that  are  zero  mean 
on  the  off-diagonal  and  unit  mean  on  the  main  diagonal.  The 
remarks  immediately  following  (3)  concerning  the  use  of 
the  inverse  of  Hare  directly  applicable  here  for  the  use  of  the 
inverse  of  H. 

When  the  crosstalk  coefficients  are  modeled  as  a  joint 
Gaussian  distribution,  so  that  they  are  not  statistically  inde¬ 
pendent.  expected  beam  patterns  can  still  be  derived.  The 
interested  reader  is  referred  to  Refs.  2  and  3  for  a  general 

1831  J  Acoust  Soc  Am  .  Vol  86.  No  5.  November  1 989 


discussion  which  is  relevant  here,  even  though  neither  dis¬ 
cuss  crosstalk. 

IV.  EXAMPLES 

As  an  example  of  the  effect  of  crosstalk  on  beam  pat¬ 
terns,  consider  an  equispaced  line  array  with  N  =  50  omni¬ 
directional  sensors  spaced  notionally  1  m  apart.  The  design 
weights  {a„ }  are  Taylor  weights  for  a  -  30-dB  sidelobe  level 
( with  the  number  n  of  controlled  nulls  set  equal  to  5 ) .  Cross¬ 
talk  between  any  pair  of  channels  is  assumed  to  be  such  that 
the  coefficients  {H„k }  are  all  real  and  positive.  These  con¬ 
stants  are  selected  from  a  uniform  pseudorandom  distribu¬ 
tion  on  the  interval  [ O.e],  where  e  is  chosen  to  correspond  to 
a  specified  maximum  crosstalk  level.  Thus  H„k  =  e/2  and  is 
not  zero  mean.  The  crosstalk  bound,  XB,  is  —  34  dB  in  this 
case.  Figure  l(a)-(d)  shows  the  broadside  beam  patterns 
for  a  maximum  crosstalk  level  of  —  74,  —  54,  -  34,  and 
—  24  dB,  respectively.  By  inspection,  crosstalk  of  —  74  dB 
has  virtually  no  impact  on  the  beam  pattern  |seeFig.  1(a)], 
while  a  crosstalk  level  of  -  54  dB,  a  full  20  dB  below  the 
crosstalk  bound  XB  =  —  34  dB,  raises  the  peak  sidelobes  by 
about  I  dB  [see  Fig.  1(b)).  When  the  crosstalk  is  equal  to 
the  upper  bound  of  —  34  dB,  the  beam  pattern  is  significant¬ 
ly  perturbed  and  has  a  peak  sidelobe  about  8  dB  above  the 
design  sidelobe  level  of  —  30  dB  [see  Fig.  1(c)].  When 
crosstalk  reaches  —  24  dB,  the  peak  sidelobe  is  1 3  dB  above 
the  design  level  [see  Fig.  1(d)). 

We  point  out  that  the  largest  individual  perturbation  of 
the  original  Taylor  weights  is  0.9%,  8.6%,  60%,  and  1 1 1  % 
in  the  cases  corresponding  to  Fig.  1  ( a )— (d ),  respectively. 
The  percentage  perturbation  was  calculated  after  normaliz¬ 
ing  the  Taylor  weights  and  the  perturbed  weights  to  sum  to 
one. 

Other  realizations  of  the  crosstalk  matrix  H  have  been 
computed,  but  are  not  presented  here.  They  reinforce  the 
fundamental  point  that,  for  this  50-element  array,  crosstalk 
levels  should  be  kept  below  —  54  dB,  or  20  dB  below 
XB  =  —  34  dB.  The  question  of  whether  or  not  this  obser¬ 
vation  remains  true  for  larger  values  of  (V  naturally  arises. 
Consider,  then,  a  100-sensor  equispaced  array  with  l-m 
spacing  between  sensors  and  Taylor  shaded  for  —  30-dB 
sidelobes  (with  n  =  10  controlled  nulls).  For  crosstalk  lev¬ 
els  of  —  70,  —  60,  —  50.  and  —  40  dB,  the  beam  patterns 
are  shown  in  Fig.  2(a)-(d),  respectively.  The  largest  indi¬ 
vidual  weight  perturbation  was  2.1%,  6.4%,  18%,  and  45%, 
respectively.  The  crosstalk  bound  XB  is  —  40  dB.  Examina¬ 
tion  of  these  figures  shows  that  crosstalk  levels  should  be 
kept  below  -  60  dB,  or  again  20  dB  below  the  upper  bound 
XB.  Evidently,  therefore,  the  crosstalk  should  always  be 
kept  20  dB  below  the  bound  XB  to  prevent  significant  beam 
pattern  degradation. 

Expected  beam  patterns  for  these  examples  are  not  pre¬ 
sented  for  two  reasons.  First,  as  was  pointed  out  previously, 
it  is  desirable  to  correct  nonzero  mean  crosstalk  to  zero  mean 
[  using  (30))  before  the  signals  enter  the  beamformer  to  pre¬ 
vent  pointing  error.  The  examples  here  are  not  zero  mean. 
Second,  assuming  zero  mean  crosstalk,  then  (27)  clearly 
shows  that  the  expected  beam  pattern  is  the  crosstalk-free 
beam  pattern  plus  a  constant  term.  The  expected  beam  pat- 

Roy  L  Streit:  Interchannel  crosslalk  and  array  perlofmance  1831 


-63- 


FIG.  I  Broadside  beam  patterns  for  50-iensor  array  crosstalk  levels,  (a)  =  -74dB,(b>  =  -54dB,(c)  =  -  34dB.  and  (d)  =  -  24  dB. 


terns  for  these  examples,  after  correcting  the  crosstalk  to 
zero  mean,  are  merely  the  appropriate  T  ay  lor  beam  patterns 
with  an  appropriate  constant  term  (independent  of  angle) 
added;  hence,  no  nulls  appear  in  the  expected  beam  patterns. 
Examples  of  this  kind  are  given  in  Ref.  2. 

The  pointing  error  was  determined  for  the  above  100 
sensor  line  array  when  the  maximum  crosstalk  level  was  set 
equal  to  the  bound  XB  =  —  40  dB.  Beams  were  steered 
from  broadside  to  endfire  in  one  degree  increments,  and  the 
pointing  error  determined  by  direct  numerical  calculation 
using  (17)  and  (18).  The  pointing  error  in  all  but  one  beam 
was  found  to  be  within  ±  0.005  deg  of  the  steered  direction. 
The  exceptional  beam  had  pointing  error  of  -  0.14  deg.  If 
the  crosstalk  coefficients  are  fixed  once  and  for  all,  then 
pointing  error  is  a  smoothly  varying  function  of  steering  an¬ 
gle.  In  these  calculations,  however,  different  crosstalk  coeffi¬ 
cients  were  calculated  for  different  beams;  this  explains  the 
one  exceptional  beam.  One  tentative  conclusion  in  this  ex¬ 
ample  is  that,  as  long  as  the  crosstalk  bound  XB  is  satisfied, 

1832  J  Acoust  Soc  Am..  Vol  86.  No  5,  November  1989 


pointing  error  is  not  significant.  The  one  exceptional  beam 
suggests,  however,  that  certain  realizations  of  the  crosstalk 
coefficients  might  possibly  result  in  surprisingly  large  point¬ 
ing  error  even  when  the  bound  XB  is  satisifed. 

V.  CONCLUSION 

A  maximum  permissible  level  of  crosstalk  in  an  array  of 
arbitrary  geometric  configuration  has  been  presented.  Also, 
the  beam  pattern  of  an  arbitrary  array  with  arbitrary  cross¬ 
talk,  steered  in  any  direction,  has  been  derived. 

Crosstalk  can  result  in  pointing  error  when  steering  the 
array,  an  effect  of  considerable  importance.  When  a  random 
variable  model  of  crosstalk  is  appropriate,  it  is  worthwhile  in 
practice  to  correct  for  nonzero  mean  crosstalk  before  the 
signals  enter  the  beamformer.  If  this  is  done,  particular  real¬ 
izations  of  the  crosstalk  coefficients  may  still  result  in  array 
pointing  error,  but  the  expected  pointing  error  is  zero  be¬ 
cause  the  expected  beam  patterns  do  not  have  pointing  error. 

Hoy  L  Strait:  Intercbannel  crosstab!  and  array  performance  1 832 


-64 


WAVENUMBER  {nRmalaf  I 


WAVENUMBER  fnWnwlAO 


FIG  2  Broadside  beam  patterns  for  100-vcnsor  arra>  crmsulk  levels:  (a)  =  -70dB.  (b)=  -  6UdB.  ( c )  =  -  50dB.  and  Id)  =  -  40dB 


ACKNOWLEDGMENTS 

The  author  wishes  to  thank  Dr.  Wayne  Strawderman 
and  Dr.  Albert  Nuttall  for  their  useful  comments  and  sug¬ 
gestions. 

APPENDIX  A:  GERSHGORIN'S  THEOREM 

Gershgorin’s  theorem  defines  a  closed  set  in  the  com¬ 
plex  z  plane  within  which  all  the  eigenvalues  of  a  general 
complex  valued  matrix  must  lie.  Let  A  =  (o,; )  denote  a  giv¬ 
en  n  x  n  matrix.  Define 

',  =  £  \a„\<  '=1 . »• 

/  I 
/*  » 

Then,  (he  eigenvalues  of  A  lie  in  the  region  of  the  complex  z 
plane  consisting  of  the  union  of  all  the  closed  disks: 

|2  -  a„\<-r,.  i  =  1 . n  .  ( A 1 ) 

A  proof  can  be  found  in  many  places,  for  example.  Ref.  1,  p. 
146.  If  the  main  diagonal  of  A  is  constant,  that  is,  a„  =  a  for 
all  /,  then  the  disks  are  concentric.  It  follows  that  the  eigen¬ 
values  of  A  lie  in  the  closed  disk: 

1833  J  Acoust  Soc  Am  ,  Vol  86.  No  5.  November  1989 


|z  —  o  < maxr,  .  (A2) 

I  <  A 

The  result  (A2)  together  with  (4)  proves  (5)  in  the  main 
text. 

APPENDIX  B:  MAXIMUM  CROSSTALK  VARIANCE  FOR 
SPECIFIED  SIDELOBE  LEVEL 

Attention  in  this  Appendix  is  restricted  to  the  case 
where  the  crosstalk  coefficients  Hnk,  n^k,  are  identically 
distributed  with  zero  mean  and  common  variance  a2.  It  is 
required  to  find  the  largest  possible  value  of  a  for  a  specified 
maximum  increase  in  the  sidelobe  level  of  the  expected  beam 
pattern  (27).  Let  the  sidelobe  level  of  the  crosstalk-free 
beam  pattern  G(v)  be  denoted 

L<,  =  B'/A- . 

where  B  is  the  level  of  the  response  of  \Gl  v)  |  when  v  lies  in 
the  sidelobe  regime,  and  A  is  the  response  of ! G ( v ) ,  when  v 
equals  the  steered  direction  u.  Thus 

I  v  I 

A=  |  ^  |  •  (Bl) 

From  (27),  we  have  in  this  case 

Roy  L  Stre»t.  Interchannel  crosstalk  and  array  performance  1833 


-65- 


f(r<v)]  =  |(7<v)|!  +  ()V-  1  )aucr, 

where 

(B2) 

a,.  =  J 

n  1 

(B3) 

Therefore,  the  sidelobe  level  of  the  expected  beam  pattern 
may  be  written 

B1  +  (N  - \)a...er 

*  A'-  +  (N  -  l)a„.<r 

We  seek  the  largest  value  of  a  for  which 

( B4) 

101og£*  -  101og/-c<10log(A3)  , 

CBS) 

where  10  log(  A3)  is  the  specified  level  (in  decibels)  that  the 
expected  beam  pattern  sidelobes  are  allowed  to  increase. 
Equivalently, 

Lc  I  +  (JV-  l)auV/^2 
Solving  for  a 3  gives 

— h— l2"-  >  w->\  (B7) 

w-i  i — lg&~ 

which  is  the  desired  relationship.  It  holds  only  for  zero  mean 
and  identical  variance  crosstalk  coefficients. 

If  10  log  A3  =  I  dB,  10  log  Lc  =  —  25  dB,  and  identi¬ 
cal  weights  wn  =  constant  5^0,  then  (B7)  gives  approxi¬ 
mately 


<r  *,0.000  822  . 

or,  taking  10  log  of  both  sides, 

201og(tr)<  -  31  dB.  (B8) 

On  the  other  hand,  if  10  log  A3  =  2  dB,  then  (B7)  gives  in¬ 
stead 

20  log!o)<  —  27.3  dB.  (B9) 

Interestingly,  (B8)  and  (B9)  are  independent  of  N  in  this 
case  because  of  the  assumption  of  constant  weights. 

The  bound  (B7)  on  the  maximum  crosstalk  variance  is 
more  strict  when  the  array  is  shaded  (that  is,  wn  # const) 
than  when  it  is  unshaded  ( w„  =  constant ) .  The  proof  of  this 
fact  follows  from  the  conditions  for  equality  in  the  Cauchy- 
Schwartz  inequality. 


'M.  Marcus  and  H.  Mine,  A  Survey  of  Matrix  Theory  and  Matrix  Inequal¬ 
ities  (Prindle,  Weber,  and  Schmidt.  Boston.  1964). 

:A.  H.  Nuttall.  Effects  of  Random  Shadings .  Phasing  Errors,  and  Element 
Failures  on  Beam  Patterns  of  Linear  and  Planar  Arrays,  NUSC  Technical 
Report  6191.  Naval  Underwater  Systems  Center.  New  London.  CT.  14 
March  1980. 

‘D.  J.  Ramsdale  and  R.  A.  Howerton.  “Effect  of  Element  Failure  and  Ran¬ 
dom  Errors  in  Amplitude  and  Phase  on  the  Sidelobe  Levels  Attainable 
with  a  Linear  Array.**  J.  Acoust.  Soc.  Am  68.  901-906  ( 1980). 


1834  J  Acoust  Soc.  Am  ,  Vol.  86,  No.  5,  November  1989 


Roy  L  Streit;  Interchannel  crosstalk  and  array  performance  1 834 


-66- 


A  Two-Parameter  Family  Of  Weights 
For  Nonrecursive  Digital  Filters  And  Antennas 

R.  L.  Streit 


-67- 


108 


IF.KF  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING,  VOL.  ASSI’  32.  NO.  I.  FEBRUARY  |QM 


A  Two-Parameter  Family  of  Weights  for 
Nonrecursive  Digital  Filters  and  Antennas 

ROY  L  STRE1T 


Abstract-Vle  derive  analytically  a  two-parameter  family  of  weights 
for  use  in  finite  duration  nonrecursive  digital  fdters  and  in  finite  aper¬ 
ture  antennas.  This  family  of  weights  is  based  on  the  Gegenbauer  or¬ 
thogonal  polynomials,  which  are  a  generalization  of  both  Legendre  and 
Chcbyshev  polynomials.  It  is  shown  that  one  parameter  controls  the 
main  lobewidth  and  the  other  parameter  controls  the  sidelobe  taper.  For 
a  fixed  main  lobewidth.  it  is  observed  that  the  Gegenbauer  weights  can 
achieve  a  dramatic  decrease  in  sidelobes  “far  removed"  from  the  main 
lobe  in  exchange  for  a  “small"  increase  in  the  first  sidelobe  adjacent  to 
the  main  lobe. 

The  Gegenbauer  weights  are  derived  first  for  discretely  sampled  aper¬ 
tures  and  filters.  An  appropriate  limit  is  then  taken  to  produce  the 
Gegenbauer  weighting  function  for  continuously  sampled  apertures 
and  filters.  The  continuous  Gegenbauer  weighting  function  contains 
the  Kaiser-Bessel  function  as  a  special  case.  It  is  thus  established  that 
the  Kaiser-Bessel  function  is  implicitly  based  on  Chebyshev  polynomials 
of  the  second  kind.  Furthermore,  the  Dolph-Chebyshev/van  der  Maas 
weights  are  a  limiting  case  of  the  discrete/continuous  Gegenbauer 
weights. 


I.  Introduction 

THE  choice  of  weights  in  the  design  of  nonrecursive  digital 
filters  and  antenna  apertures  is  an  important  problem  for 
which  there  is  a  large  literature.  In  this  paper  we  present  the 
Gegenbauer  weighting  function,  so  named  because  it  is  based 
on  the  Gegenbauer  orthogonal  polynomials  ( 1  ] .  The  Gegen¬ 
bauer  weights  may  be  applied  equally  well  to  nonrecursive 
digital  filters  and  both  discrete  and  continuous  antenna  aper¬ 
tures.  The  resulting  FIR  filter  coefficients  can  be  used  as  a 
shading  function  for  the  spectrum  analysis  of  sampled  data  to 
reduce  sidelobe  leakage.  Our  discussion  in  this  paper  will  be 
restricted  to  The  antenna  form  of  the  problem  merely  to  avoid 
unnecessary  complication  in  the  presentation. 

The  Gegenbauer  design  is  a  two-parameter  family  of  weight¬ 
ing  functions.  One  parameter,  z0,  is  used  to  control  the  beam- 
width.  The  other  parameter,  p,  is  used  to  achieve  sidelobe 
taper.  Both  z0  and  p  may  be  varied  continuously  and  indepen¬ 
dently  of  each  other.  The  Gegenbauer  design  is  especially  use¬ 
ful  in  achieving  dramatic  decreases  in  distant  sidelobes  in  ex¬ 
change  for  “small”  increases  in  the  first  sidelobe  adjacent  to 
the  main  lobe.  Conversely,  dramatic  increases  in  distant  side¬ 
lobes  can  be  exchanged  for  “small”  decreases  in  the  fust  side¬ 
lobe.  This  will  be  clarified  by  the  examples. 

Manuscript  received  August  11,  1982;  revised  November  30,  1982. 
This  work  was  supported  in  part  by  the  Office  of  Naval  Research  under 
Project  RR014-07-01  and  by  The  Independent  Research  Program  of  the 
Naval  Underwater  Systems  Center. 

The  author  was  on  leave  at  the  Department  of  Operations  Research, 
Stanford  University.  Stanford.  CA  94305.  He  is  with  the  New  London 
Laboratory.  Naval  Underwater  Systems  Center,  New  London,  CT 
06320.  and  the  Department  of  Mathematics,  University  of  Rhode  Island, 
Kingston,  Rl  06320. 


The  Gegenbauer  weights  are  derived  first  for  a  finite  dis¬ 
crete  aperture.  An  appropriate  limit  then  gives  the  Gegenbauer 
weighting  function  for  a  bounded  continuous  aperture.  Many 
similarities  between  the  Gegenbauer  weights  and  the  Dolph- 
Chebyshev/van  der  Maas  weights  [2),  [3]  will  be  evident  from 
the  derivation.  In  fact,  these  latter  weights  are  limiting  forms, 
as  p  ■“  0,  of  Gegenbauer  weights.  Also,  the  Kaiser-Bessel 
weighting  function  (4,  pp.  232-233)  for  the  continuous  aper¬ 
ture  is  the  special  case  p  =  I  of  the  Gegenbauer  design.  This 
shows  that  the  Kaiser-Bessel  function  is  implicitly  based  upon 
Chebyshev  polynomials  of  the  second  kind,  a  fact  which  seems 
to  have  escaped  notice  until  now.  This  is  interesting  since,  as 
is  well  known,  the  Dolph-Chebyshev/van  der  Maas  weights  are 
based  on  Chebyshev  polynomials  of  the  first  kind. 

One  drawback  to  the  van  der  Maas  weighting  function  for  the 
continuous  aperture  is  that  it  has  5-function  spikes  at  the  aper¬ 
ture  endpoints.  The  Gegenbauer  function  dozs  not  have  this 
feature;  that  is,  the  Gegenbauer  weighting  function  for  the 
continuous  aperture  is  a  bounded  continuous  real-valued  func¬ 
tion  across  the  whole  aperture.  However,  since  the  van  der 
Maas  function  is  a  limiting  case  of  the  Gegenbauer  function  as 
p-*  0.  the  Gegenbauer  function  must  approximate  this  be¬ 
havior  in  the  neighborhood  of  p  =  0.  The  Taylor  design  (5)  is 
an  alternative  way  to  overcome  this  6-function  behavior  of  the 
van  der  Maas  function,  but  it  is  unrelated  to  any  of  the  Gegen¬ 
bauer  designs.  The  proof  of  this  statement  is  self-evident  from 
the  examples  presented  later. 

The  Gegenbauer  polynomials  CH(x)  are  defined  here  pre¬ 
cisely  as  in  Szego  |1)  which  is  used  as  our  standard  both  in 
function  definition  and  notation,  with  only  two  exceptions. 
Szego  uses  the  notation  P^x)  instead  of  C"(x)  and  refers  to 
them  as  the  ultraspherical  polynomials.  This  paper  will  not  at¬ 
tempt  to  recapitulate  any  of  the  known  facts  about  the  poly¬ 
nomials  that  can  be  referenced  in  Szego.  It  suffices  to  say  here 
only  that  C“(x)  is  a  real  valued  polynomial  of  degree  precisely 
n,  and  that  the  system  {Cg  (*),  Ci*  (x),  C?(x),  ■  ■  is  orthog¬ 
onal  on  the  real  interval  [-1,  +1J  with  respect  to  the  weight 
function  (1  -  provided  p  >  -  $,  p  #  0.  Moreover,  by 

taking  appropriate  limits  and  using  their  hypergeometric 
functional  form,  <?“(*)  can  be  defined  for  all  real  p.  See  (1, 
eq.  (4.7.7)).  In  particular,  if  T„(x)  and  U„(x)  denote  the 
Chebyshev  polynomials  of  the  first  and  second  kinds,  respec¬ 
tively,  then  [I ,  eq.  (4.7.8),  (4.7.17)) 

Co°(*)=  r0(x),  lim  =~T„(x).  n  >  1 

M  0  A* 

(1) 


-69- 


STRUT  TWO- I’ARAMl.TI'R  FAMILY  OF  WEIGHTS 


109 


and  (1.  eq.  <4.7.2)) 

C,\(x)=  L'„(x).  n  >  0.  (2) 

The  derivation  of  formulas  more  general  than  are  perhaps 
necessary  in  the  antenna  application  is  relegated  to  the  Appen¬ 
dix.  Special  cases  of  these  formulas  will  be  extracted  as  needed 
and  used  without  comment  in  the  main  body  of  this  paper; 
however,  every  effort  will  be  made  to  motivate  the  discussion. 

II.  Gegenbauer  Weights  tor  a 
Discrete  Aperture 

The  Gegenbauer  design  for  a  finite  discrete  aperture  is  de¬ 
rived  for  a  single  frequency  half-wavelength  equispaced  linear 
array  of  omnidirectional  elements.  Other  than  the  steering 
factor,  we  will  always  assume  the  aperture  (discrete  or  con¬ 
tinuous)  is  symmetrically  weighted  about  the  geometric  center 
of  the  array.  The  array  axis  is  taken  to  be  the  x-axis  and  all 
angles  arc  measured  from  a  line  normal  to  the  array  axis. 

Let  A'  be  the  number  of  elements  in  the  array  (hence  A'  >  2), 
and  let  the  positions  of  these  elements  be  xk  =  /tX/2,  *  =  1.2. 

•  ■  ■  .  A’,  where  X  is  the  wavelength  of  the  design  frequency.  (In 
the  Appendix.  X  denotes  an  arbitrary  real  variable,  not  fre¬ 
quency.)  If  the  array  is  steered  to  look  in  the  direction  fl,. 

■  tr  ’<(?,<  it, '2,  and  if  the  array  receives  a  plane  wave  of 
wavelength  X  from  the  arrival  direction  8a.  -rr/2  <0a  <  ir/2, 
then  the  complex  transfer  function  of  a  linear  bcamformer  is 
given  by 

A 

finis  Y.  exp (  iirkii)  (3) 

**  i 

where 

u  4  sin  -  sin  0,  (4) 

and  {«■*}■'  are  the  individual  element  weights.  Symmetrical 
weighting  is  assumed,  so  wN.k.,  =  wk  for  all  k.  Positive 
weighting  is  desirable,  but  not  necessary. 

The  Dolph-Chebyshev  design  proceeds  as  follows  for  a  design 
specification  of  -  5  dB  peak  sidelobe  level.  Let 

j04  ^{(r  +  s/7rn"]1/"  +  [r-VHHr],/',}1 

r&  \0sno  (5) 

and  n  i  M  -  1 .  Notice  that  z0  >  1  if  and  only  if  the  peak  side- 
lobe  level  is  lower  than  the  level  of  the  maximum  response 
axis,  or  MRA.  Prom  ( A20)  of  the  Appendix,  the  expansion 

|n/!| 

r„(Cocosu)=  nr  ik  n(z0)cos  fln  -  2k)u\  (6) 

k  =  0 


element  weights  when  we  define 

for  S  even: 


"tv-*.i  _  =  J  Cr(*i,tv-i(*o) 

nk) k 

fur  N  odd 


.  1 

H/1-»*1  ~  wk  =  ^  cHk).N-  i(*o) 


*=  i. : 


.  N+  1 

f(k)  =  i  -  k 


*=  1.2, 


N 

1  (8) 


N+  1 
2 


(9) 

Thus,  the  complex  transfer  function  (3)  is  given  explicitly  for 
these  weights  by 

f(ir)  =  f,,'<'v<,)“/37'A,  _,(z0  cos  ( j  rru));  (10) 

the  maximum  response  occurs  for  u  =  0, 

f(0)  =7-*.  ,(;„);  (ii) 

and  the  smallest  positive  value  of  u  such  that  F(u )  =  0  is  given 
by 


2  /  1 

u0  =  —  arccos  I  —  cos 

n  \z0 


(12) 


The  half  beamwidth  as  measured  to  the  first  null  from  the 
MRA  is  precisely  u0. 

The  Gegenbauer  design  proceeds  in  an  analogous  fashion. 
We  replace  the  old  constant  z0  by  a  new  variable  z„  which  will 
be  defined  later  (30);  however,  for  p-0,zu  is  still  defined  by 
(5).  Now,  in  the  expansion 

C£(z„  cosu)=£I'6*,b(zm)cos  [(n~  2 k)u]  (13) 

*- 0 


the  coefficients  &*<rt(zM)  depend  on  zM  and  arc  given  explicitly 
by 


bk,n(2n)  * 


L 


m*  o 


(M  *  -  1 

ml  (k  -  m)\(n-  k  -  m)\ 


(14) 


Both  of  these  identities  are  special  cases  of  (A  1 8)  and  (A  1 9)  of 
the  Appendix.  Note  that  f>*,„(zo)  >  0  for  all  k,  provided  that 
zu  >  1  and  p  >  0.  Note  also  that,  by  (1),  (14)  reduces  to  (7) 
in  the  limit  as  p  -» 0.  For  numerical  computation,  the  following 
form  is  preferred  to  (14).  Let  A  -  1  -  zjj2,  so  that  0  <A  <  1 
when  z„  >  l,  and  then  compute  the  right-hand  side  of 


clearly  exists,  where  the  prime  on  the  summation  means  that 
y  the  last  term  in  the  sum  is  taken  if  n  is  even,  and  all  of  it  is 
taken  if  ii  is  odd.  From  ( A21 )  we  have  explicitly 


ck.  „(:<,)  =  n(n  -  k 


I)! 


Z 


m  *o 


Un)k-m(zl  -  I )mzVm 
ml  (k  -  m)l  [n  k-  m)l  ’ 


(7) 

The  coefficients  ck  „(zg  )  were  first  given  in  this  form  by  van 
der  Maas  (3 1 .  who  derived  them  using  a  method  different  from 
that  in  the  Appendix.  By  inspection,  notice  that  ck  n(z0  )>  0 
for  all  k  whenever  z0  >  1 .  The  coefficients  c».„(z0)  yield  the 


bk.nUu) 


n  -  k 


+  k  - 

■ir.Vc.v 

coeffi 
nnegat 

GM,>  * 


(15) 


The  binomial  coefficients  are  defined  here  for  any  real  number 
a  and  any  nonnegative  integer  p  by 

a(a  -  1)  •  ■  ■  (o  -  p  +  1) 


P>  1 


(16) 


although  they  are  best  computed  recursively  using 


-70- 


110 


IEEE  TRANSACTIONS  ON  ACOUSTICS,  SPEECH.  AND  SIGNAL  PROCESSING,  VOL.  ASSP  J2.  NO.  I.  EEBRL'AKY  1984 


a- p+ I  /  a  \ 
P  \P-  1/ 


p>l 


(17) 


to  avoid  floating  point  overflow  at  some  intermediate  point  in 
the  computation. 

It  should  be  pointed  out  that  (IS)  can  be  evaluated  numeri¬ 
cally  for  all  p  since,  for  fixed  n  and  A,  ( 1 5)  is  a  polynomial  in 
p.  However,  (15)  is  correct  only  if  p  #  -  1, -2, -3,  .  If  co¬ 

efficients  are  required  for,  say,  p  =  -  5,  both  sides  of  (13)  must 
first  be  divided  by  p  +  5  and  the  limit  taken  as  p  +  5  -►  0.  Con¬ 
sequently,  in  (IS),  the  factor  p  +  5  must  be  divided  out  alge¬ 
braically  before  numerical  computation  begins. 

The  coefficients  b*,„(z„)  yield  element  weights  {wk}^  when 
we  define 
for  A'  even: 


.  * 

wN-k*l  =  wk  =  ^ 

r(*)|-y  -  A 


*=1.2, 


.y  (18) 


for  N  odd: 


wN-k* l  =  wk  =  ~  (’f<*)./V-l(z|i) 

.  A  +.  1  , 

/(*)  =  — - —  -  * 


A=  1,2.- 


N  +  1 


(19) 


The  ratio  in  (24)  is  perhaps  best  evaluated  by  computing  two 
different  sequences 

{C£(n)£..  and  (>’*»;>« !  <25> 

numerically  from  the  fundamental  recursion  |1.  eq.  (4.7.17)) 
pC£(x)  =  2(p  +  o-  l)xCp.  |  (Jr ) _  (p  +  2a-  2 )C£.2<x). 

p  =  2, 3,4,  -  -  -  ,n  (26) 

C(x)=l,  C?(x)=2ox. 

The  recursion  (26)  is  valid  for  a  #  0,  -  1,  -2,  -3,  -  -  ■  .  This 
method  may  have  weaknesses  whenever  p  is  very  close  to  0 
(say,  |p |  <  10~4 )  because  of  the  division  by  p  in  (24);however, 
p  would  normally  be  taken  either  equal  toO(togivetheDolph- 
Chebyshev  design)  or  else  sufficiently  different  from  0  to  affect 
sidelobe  levels  appreciably.  This  latter  stipulation  seems  to  re¬ 
quire  |p|>  10““.  In  the  antenna  application,  then,  computa¬ 
tion  of  the  Newton-Raphson  iteration  step  from  the  recursion 
(26)  seems  perfectly  safe  whenever  a  special  precaution  is  taken 
for  p  =  0.  In  practice  this  author  has  never  seen  the  iteration 
require  more  than  four  steps,  and  he  has  never  seen  it  converge 
to  the  wrong  point.  If,  however,  it  should  ever  happen  to  con¬ 
verge  to  the  wrong  point,  the  Newton-Raphson  iteration  can 
be  restarted  with  the  new  initial  point  y,  =  1.  Also,  the  in¬ 
equality  [  I .  eq.  (6.21 .3)] 

—  x^  <  0  for  all  p  (27) 

3p 


With  these  weights,  the  complex  transfer  function  (3)  is  given 
explicitly  by 

F(u)=  e‘''iN*l)u,2  Cft-,  cos(j  rru)).  (20) 

The  maximum  response  of  F(u)  should  occur  for  u  =  0,  and  is 

f(0)  =  C£.,(Z(1).  (21) 

(For  a  discussion  of  unusual  situations  when  the  MRA  might 
not  occur  at  u  =  0,  see  below  in  this  section.) 

The  smallest  positive  value  of  u  satisfying  F(u)  =  0  is  given 
by 


uv  =  ~  arccos  xjfl  (22) 

where  xjtfl ,  is  the  largest  zero  of  the  Gegenbauer  polynomial 
Cft-iC*).  Thus,  for  p>- 1/2, xji}1*,  must  lie  in  the  open  inter¬ 
val  (- 1,  +1 ).  In  fact,  it  must  be  very  near  +1  for  values  of  p 
of  interest  in  this  application.  An  explicit  analytic  expression 
for  xfH ,  is  not  known  except  in  certain  special  cases  (e.g.,  the 
Chebyshev  polynomials)  and  so  must  be  solved  for  numerically. 
Thisminor  difficulty  is  readily  overcomeusing  Newton-Raphson 
iteration.  Recalln  =  W-  I.  Since  [1,  eq.  (4.7.14)] 


Jx  C^(x)  =  2uCtf:!  (x) 


the  Newton-Raphson  iteration  is 


>\t.i  =yk  - 


(» ) 

2 ucr i  O'*)' 


A  =  1, 2,  •  •  - 


(23) 


(24) 


implies  that 

xM<xM  =  cos(-^)<xpK  p  >  0,  (28) 

which  can  serve  as  a  check.  Incidentally,  inequality  (2"’)  holds 
for  all  the  positive  zeros  Cg  (x),  not  merely  the  largest  one. 

The  reason  for  all  this  concern  over  calculation  of  the  half¬ 
beam  width  (22)  is  simply  to  be  able  to  make  fair  comparisons 
between  sidelobe  levels  of  different  Gegenbauer  designs,  that  is, 
different  values  of  p.  it  is  well  known  that  the  sidelobe  levels 
in  Dolph-Chebyshev  beam  patterns  are  sensitive  functions  of 
the  beamwidth,  and  there  is  every  reason  to  expect  similar  be¬ 
havior  in  the  Gegenbauer  designs.  Therefore,  as  p  is  varied  it  is 
helpful  to  maintain  a  fixed  beamwidth;  specifically,  we  always 
require  u u  =  u0  for  all  p.  This  in  tum,  from  (22)  and  (12), 
gives 


or,  converting  convenience  into  a  definition, 


sec  <30 

From  (30)  it  is  now  clear  that  computing  the  largest  zero.  - , . 
of  eft.,  (x)  is  of  considerable  importance. 

With  the  definition  (30),  all  Gegenbauer  designs  with  different 
values  of  p  and  fixed  z0  have  the  same  beamwidth  as  measured 
to  the  first  null  off  the  MRA.  Thus,  the  beamwidth  is  varied 
simply  by  changing  the  value  of  z0  in  exactly  the  same  way  as 
in  Dolph-Chebyshev,  i.e.,  (5). 

An  interesting  consequence  of  (30)  is  that  zu  might  not  always 


71- 


SI  HI  IT:  TWO  I'AKAMI U  K  I  AM11.Y  OT  WT  U.II  IS 


be  greater  than  1  lor  all  p  >  0.  This  observation  follows  imme¬ 
diately  from  the  derivative  ( 27).  Hence,  for  some  critical  posi¬ 
tive  value  of  p.  say  p*.  we  have  =  I.  In  ( 15)  the  number  A 
is  negative  for  p  >  p*.  so  the  positivity  of  the  weights  cannot 
be  guaranteed  without  direct  calculation  because  ( 1 5 )  is  an  al¬ 
ternating  series  for  p  >  p*.  At  the  critical  point  ft*.  A  =  0  and 
the  sum  in  ( 1 5 )  collapses  to  a  single  term.  Simplifying  gives 


(31) 


which  can  be  found  alsoinSzego  [  I ,  eq.  (4.9. 19)] .  The  weights 
for  the  critical  case  p  =  p*  can  now  be  varied  merely  by  chang¬ 
ing  p*.  In  particular,  foi  /a*  =  1,  (31)  gives  the  uniformly 
weighted  array,  that  is,  cjk  -  1  for  all  k.  The  beamwidth  ob¬ 
tained  from  the  weights  (31)  depends  on  (and  only  on)  the 
critical  value  p*  because  p*  implicitly  depends  on  z0. 

Since  Gegenbauer  designs  have  the  two  parameters  :a  and  p, 
with  z0  controlling  main  lobewidth.  the  parameter  p  must  con¬ 
trol  sidelobc  behavior.  From  (20)  and  (30)  we  see  that  side- 
lobes  occur  for  n  satisfying 

jzM  cos  (j  mi)'  <  cos  (y  rru0)  <  1.  (32) 

In  the  sidelobc  region,  then,  we  can  define 

COS  <p  =  cos  ( y  rrn ).  0  <  <i  <  rr. 

For  the  moment  let  us  suppose  0  <  p  <  1  Then,  from  Szego 
[l.eq.  (7.33.5)| 

(sin  4>)u  IC,t  (cosp)'  <  2'  ■"  nu~ 1  T(p)  (33) 

so  the  transler  function  /•'(«)  must  satisfy 

|F'(u)l  <(l  -  z2  cos:  (|mr)r»‘/J  2,'*‘nM',/np)  (34) 

throughout  the  sidelobe  region  defined  by  (32).  For  p  outside 
the  (0.  I )  interval,  but  excluding  p  =  0,  - 1, -2.  •  •  • .  the  sharp¬ 
ness  of  the  inequality  ( 34)  is  lost .  A  special  case  ofa  result  given 
in  Szego  ( 1 ,  eq.  (8.21 .14)  with  p  -  1 J  implies  that 

|/r(u)|  <(1  -  z£  cos2  (y  mi))~un  2‘ P  ^ 

+  (35) 

throughout  the  sidelobe  region  defined  by  (32).  For  p  outside 
(35)  is  asymptotic  to  n"' 1  /T(p)  as  n  -  so  the  leading  term 
of  the  right-hand  side  of  (35)  is  asymptotic  to  the  right-hand 
side  of  (34).  For  fixed  p.  the  right-hand  side  of  (35)  appears 
to  be  an  excellent  envelope  for  the  sidelobes  of  the  Gegenbauer 
designs. 

For  p  >0,  it  is  clear  from  (35)  that  the  sidelobe  envelope 
must  steadily  decay  as  u  approaches  endfire,  i.e.,u  =  1.  Since 
(use  (26)| 

lO,  if  n  odd 

r,r(  m  )  ifn  =  2'" 

we  have 

|f-(l)|  =  |C“(0)|  «’|-‘,fi*"/r((i|  (36) 

approximately,  for  n  even.  Contrasting  this  approximation  with 


the  inequality  (35)  leads  to  the  conclusion  that  (36)  is  an  excel¬ 
lent  approximation  to  the  sidelobe  envelope  for  both  even  and 
odd  ii.  Thus,  we  utilize  (3b)  for  all  n  Applying  results  proved 
below  in  anothei  context  (specifically,  sci  v  =  0  in  (54)  and 
(55)|  gives  an  approximation  for  the  maximum  response 


K'(0)j  -  ■ 


n -u 


nur> 


l/jL'gzJ 23<t_) 

V  2  t*',/3 


(37) 


with  r  defined  by  (5)  and 

r  £  ((arccosh  r )3  +  ir3/4  -  /3_  1/2 )  1/2  (38) 


where  ,/2  is  the  smallest  positive  zero  of  the  Bessel  function 
(x)  of  the  first  kind  and  order  p-  1/2.  and  lu_  1/2  (x) 
is  the  modified  Bessel  function  of  the  first  kind  and  order 
p-  1/2.  Therefore,  we  have  the  relative  level 


(39) 


This  result  happens  to  be  exact  forp  =  0,  the  Dolph-Chebyshev 
case,  as  can  be  easily  verified.  Evidently  this  result  also  implies 
that  the  sidelobe  height  at  endfire  is  a  funciion  of  u,  even  when 
p  and  the  beamw  idth  parameter  z0  are  fixed.  In  other  words, 
the  sidelobc  tapering  effect  of  a  given  value  of  p  depends  on 
ii.  unless  p  =  0  Numerical  examples  bear  out  the  n'u  depen¬ 
dence  in  (39). 

An  important  observation  based  on  (35)  and  (39)  is  that 
for  p  <  0  the  sidelobes  may  well  steadily  increase  as  p  ap¬ 
proaches  endfire.  Thai  this  is  in  fact  the  case  is  borne  out  by 
the  examples  given  later. 

It  should  be  emphasized  that  although  the  Gegenbauer  weights 
must  be  positive  if  0<p<p*.  they  might  not  necessarily  be 
positive  if  p  <  0  or  if  p  >  p*.  For  p  <  0  it  can  happen  that 
all  are  positive,  or  that  some  are  negative.  Only  numerical  com¬ 
putation  can  show  which  is  the  case.  If  some  of  the  weights  are 
negative,  it  becomes  a  possibility  that  the  maximum  response 
might  not  occur  for  u  -  0. 

For  the  Gegenbauer  weights  it  is  readily  shown  that  a  suffi¬ 
cient  condition  for  the  MRA  to  be  at  u  =  0  is  that  C%(x)  attain 
its  maximum  over  the  interval  [- 1 ,  1 )  at  x  =  I .  By  a  well 
known  result  [I,  eq.  (7.33.1))  the  maximum  of C“(x) occurs 
at  x  =  I  if  and  only  if  p  >  0.  Thus,  a  sufficient  condtion  for 
u  =  0  to  be  the  MRA  is  that  p  >  0.  For  p  <  0  the  MRA  de¬ 
pends  on  (he  size  of  zM  and  must  be  verified  numerically.  From 
(1,  eq.  (7.33.1)}  the  maximum  of  C%(x)  occurs  at  or  near  x  = 
0  when  p  <  0;  therefore,  if  the  MRA  is  not  at  p  =  0.  then  the 
MRA  must  be  at  or  near  endfire.  This  observation  is  rendered 
quite  reasonable  when  considered  in  the  light  of  the  examples 
presented  later.  This  author  has  never  experienced  a  case  where 
the  M  RA  was  not  u  =  0  for  p  >  - 1  /2  and  reasonable  values  of 

1 li¬ 
lt  would  be  interesting  to  know  how  much  energy  is  contained 
in  the  main  lobe  of  a  Gegenbauer  design.  From  (20)  and  (30). 
(his  requires  a  tractable  form  for  the  integral 

!  (C£  (*„  cos  ( {  iro))] 3  du  (40) 

o 

which  we  do  not  have.  On  the  other  hand,  (he  total  "weighted" 
energy  contained  in  all  of  the  sidelobes  is  the  smallest  possible 


-72- 


h:i;i:  transactions  on  acoustics,  si-ei  cii.  and  signal  processing.  vol  asspj:.  no  i.  eebri  ary  imu 


forp>-l.'2.  Specifically,  if  it,,.,  denotes  a  polynomial  of  where 
degree  al  most  n  -  1 ,  then  1 1 ,  cq.  (4.7.1 5)|  forp>-l  2 


I  (I  -  |t"  -  ir„  - ,  (A)|2  dx 


V(n  +  2p) 

2  i«+/i)n«+D 


(n#0). 


+  ')(*"- 


Substituting  .T  =  c„  cos  (mr, '2)  thus  establishes  our  claim.  How¬ 
ever.  a  problem  with  this  formulation  is  that  part  of  the  main 
lobe  energy  is  included  in  the  total  weighted  sidelobe  energy. 
The  reason  is  that  the  x-interval  |jrJ,l'\  +  I]  is  transformed  [use 
(12)  and  (30)1  to  the  u-interval 


rif  m  +  2 

x  =  —  + - 

2L  2 


furthermore,  if  if„_,(.r)  is  the  minimizing  polynomial,  then 
1 1 .  eq.  (4.7.9)) 


-i)f] 


C(m  =  f”("*l)“/!  F(u).  (501 

From  (20).  for  the  Gegenbauer  weights, 

e»L<n.WnG(v)=CM^  co$  ^  (51) 

In  order  to  take  the  limit  in  (5 1 )  as  n  -»  «®,  we  need  to  establish 
the  asymptotic  behavior 


which  is  a  subset  of  the  main  lobe  region.  For  the  Dolph- 
Chehyshev  case  p  =  0.  this  i/-interval  goes  from  the  first  null 
up  to  the  point  on  the  main  lobe  equal  to  the  overall  sidelobe 
level  and.  so.  is  not  considerable.  For  larger  values  of  p.  this 
u-tnterval  grows  larger  because  of  (27)  and  thus  contributes 
progressively  more  significant  portions  lo  the  weighted  sidelobe 
energy  estimate. 

III.  Gegenbauer  Weights  eor  a 
Continuous  Aperture 

The  Gegenbauer  weights  derived  for  the  discrete  finite  aper¬ 
ture  have  a  limiting  form  as  n  —  with  total  aperture  length 
2 L  held  constant.  This  is  essentially  the  high-frequency  limit 
of  the  weights  as  functions  of  design  frequency.  The  limiting 
form  is  a  continuous  real-valued  function  defined  on  the  whole 
aperture  and  must  be  nonnegative  if  0  <  p  <  p* .  The  case  p  =  0 
develops  6-function  spikes  at  the  aperture  endpoints-,  i.e..  the 
case  p  =  0  gives  the  van  der  Maas  function.  For  p  >  p*  the  limit 
is  still  continuous,  but  we  cannot  guarantee  by  simple  inspec¬ 
tion  that  it  is  nonnegative  across  the  entire  aperture.  Forp<0, 
the  integral  (60)  below  diverges. 

Let  the  continuous  aperture  be  taken  to  be  the  closed  inter¬ 
val  [-/.,L|  on  the x-axis  Rewriting (3 (gives 

fN 

F(u)=  I  W0  (x )  exp  (-irrxu)  dx  (43) 


,  2  sec  (~ )  •  « 


where  t  is  defined  by  (38).  The  proof  uses  the  asymptotic 
results 

r0  =  cosh  arccosh  rj  (54) 

(arccosh  rj1 

al+ - r-y— — ,  n-«° 

2n‘ 

a  sec  ^  arccosh  rj ,  n-*“>  (55) 

and 

xhu)  *  cos  ,  n  -  •».  (56) 

Apparently  (54)  was  first  given  in  (6) ;  it  follows  directly  from 
the  definition  of  the  Chebyshev  polynomials  and  the  fact  that 
r  >  1 .  On  the  other  hand,  (56)  follows  from  the  Mehler-Hetne 
result,  (A2)  of  the  Appendix,  by  specializing  it  to  the  Gegen¬ 
bauer  polynomials  using  (l,eq.  (4.7.1)] .  Now,  from  (30), 

z„  a  sec  ^  arccosh  rj  cos^  j  sec  (~^~j  ■  n  -* » 


h'0(x)S  £  wk  5(x  -  At). 

*«i 


(The  integral  in  (43)  includes  all  of  the  impulses  at  1  and  A?.) 

Scaling  the  interval  [1,  TV]  to  the  given  aperture  [-L,  L\  and  with  r' defined  by  (53).  We  point  out  that  if  r'  is  pure  imaginary, 
using  the  fact  that  the  weights  {w* }?  are  symmetric  gives  then  the  hyperbolic  secant  can  replace  the  secant  in  (52).  The 

fL  possibility  of  imaginary  t'  does  not  affect  the  validity  of  the 

W(f)  cos  (ft;)  di  (45 )  following  argument. 

„  Finally,  from  (51),  normalizing  by  the  factor  n1  ~  lM/(2 p)  to 


-73- 


SI  Ki  n  TWO  I'AkAMI  u:r  family  of  weights 


keep  (i{v)  bounded  gives 

Hiv)  ~  bn  ~^-ei2Lln'u“lnG(v) 
~  2  m 


Note  that  the  beam  pal  tern  function  (  58)  is  a  well-defined 
function  of  e  for  all  real  and  complex  values  of  p  (in  fact,  it  is 
an  entire  function  of  u  for  all  p)  so  that  it  can  be  computed  and 
inspected  in  the  absence  of  any  corresponding  weighting  func¬ 
tion.  In  particular,  for  negative  p  the  beam  pattern  function 
(58 (grows  with  increasing  v  iust  as  might  be  expected  from  the 
discrete  aperture  case.  However,  the  beam  pattern  (58)  for 
p  <  0  is  not  realizable  as  the  cosine  transform  of  a  continuous 
function  on  the  closed  interval,  or  aperture.  [-(..  /.] . 


_  -/p-i/tiz-v^ -t'm 

-  ;“r(p  + 1)  (Tv7  -  t'2  )"' 1/2 

where  (58)  is  merely  (A6)  of  the  Appendix.  Thus.  (58)  gives 
the  beam  pattern  of  the  continuous  Gegenbauer  weighting  func¬ 
tion  on  the  interval  L] .  The  first  null  of  H(v)  is 

e0  =  ~  [(arccosh  r)2  +  n2  /4) 1/2  (59) 


which  is  derived  from  ( 58)  by  using  (55).  Note  that  v0  is  inde¬ 
pendent  of  p  because  of  (30). 

The  beam  pattern  (58)  is  easier  to  derive  than  the  continuous 
Gegenbauer  weighting  function.  Although  one  can  find  the 
Fourier  transform  of  (58)  as  a  special  case  of  Soninc’s  second 
finite  integral,  (A!),  the  assertion  that  this  transform  is  indeed 
the  limit  of  the  Gegenbauer  weights  for  a  discrete  aperture  re¬ 
quires  a  separate  proof.  Conceivably  the  Gegenbauer  weights 
might  diverge  even  though  the  limit  (58)  exists.  This  in  fact 
happens  only  for  p  <  0  The  proof  constitutes  about  half  the 
attention  of  the  Appendix,  see  especially  (A8).  (A22),  (A26), 
(A27).  and(A29).  The  final  answer  can  be  found  by  specializing 
(A29).  using(A25).  to  yield 


«<u)  = 


— ' -  f  (VP T3)"-1 

♦  1  )UrY- 1  J0  * 

lu-  i  (/.r'  Vl  -  V  )  cos  Lvl  (60) 


2*T(p 


The  continuous  Gegenbauer  weighting  function  on  the  aperture 
is  obvious  on  setting  f  =  (.(  The  continuous  Gegenbauer  func¬ 
tion  depends  on  the  parameter  p,  which  we  must  restrict  to 
p  >  0  for  the  integral  to  converge  [see  (A23)J .  It  also  depends 
on  the  beamwidth  parameter  z0  through  the  variable  t' defined 
by  (53). 

The  Kaiser-Bessel  window  is  a  special  case  of  (60),  as  is  easily 
seen  by  setting  p  =  I .  Since  the  Gegenbauer  polynomials C^(x) 
forp  =  1  are,  from  ( 2 ).  the  Chebyshev  polynomials  of  the  second 
kind,  it  is  clear  that  Kaiser-Bessel  must  be  their  continuous  ana¬ 
log.  Also,  our  claim  that  the  van  der  Maas  weighting  function 
is  a  limiting  case  of  (60)  asp  -*  0  can  be  seen  from 


/  Jy  \ 

lim  x“ "  1  i  (ar)  =  — - +  2S(ar).  (61) 

u  -*  <>♦  •* 

Substituting  l.r'\ / 1  -  (2  for  x  in  (61)  and  then  substituting  in 
(60)  yields  the  van  der  Maas  function.  The  result  (61)  was 
pointed  out  to  the  author  by  A.  H  Nuttall  in  a  private  com¬ 
munication  [ 7 1  while  the  present  paper  was  being  drafted. 


IV.  hXAMn.ES 

The  five  examples  presented  here  are  for  the  discrete  aperture 
with  100  elements  at  a  half  wavelength  spacing  and  steered 
broadside.  The  half  beamwidth,  measured  from  the  MRA  to 
the  first  null,  is  2.565588°  and  is  the  same  for  all  five  examples. 
This  is  accomplished  by  defining  ;u  as  in  (30)  and  computing 
it  in  the  manner  described  in  detail  in  Section  11.  (23  )-(28).  The 
remaining  free  parameter,  p.  we  take  equal  to0.4. 0.2.0  0.-0.2, 
-0.4.  successively.  The  Gegenbauer  weights  are  computed  in 
the  suggested  form  (15),  and  the  resulting  beam  patterns  for 
these  five  values  of  p  are  given  in  Figs  1-5,  respectively.  The 
independent  variable  in  these  patterns  is  the  angle  0U.  not  u. 
the  vertical  axis  is  20  log10  I /-(sm  0a  )|. 

Perhaps  the  most  prominent  feature  of  these  five  beam  pat¬ 
terns  is  that  the  sidclohe  structure  for  a  fixed  positive  value  of 
p  is  "reciprocal’"  to  that  foi  -p.  Consider  p  =  ±0.4.  for  instance 
If  the  reader  takes  a  Xerox  of  both  beam  patterns  and  turns  one 
of  them  upside  down  on  top  of  the  other  (literally)  and  holds 
the  pair  up  to  the  hgh:.  then  it  will  be  abundantly  clear  what 
“reciprocal"  means  in  this  context.  The  cause  of  this  attrac¬ 
tive  matching  of  sidelobe  envelopes  is  that  the  bound  (35)  is, 
in  faci.  very  reflective  of  true  sidelobe  tapei.  Thus,  for  positive 
p  the  sidelobes  decay,  while  for  negative  p  the  sidelobes  grow 
For  p  =  0  the  sidelobes  neither  grow  nor  decay:  they  remain 
constant.  The  case  p  =  0  is.  of  course,  the  Dolph-Chebyshev 
design.  The  author  has  not  undertaken  any  further  studies  to 
determine  the  accuracy  of  the  sidelobe  envelope  factor. 

Another  important  feature  is  that  the  first  sidelobe  atone 
seems  to  be  extremely  important  in  determining  the  possible 
size  of  the  remaining  sidelobes.  Although  this  is  not  a  rigorous 
statement,  it  does  seem  to  be  borne  out  by  these  examples.  For 
p  =  0.2  the  first  sidelobe  is  increased  by  about  1  dB  to  -  29  dB, 
the  second  sidelobe  seems  unchanged  at  -30  dB,  and  all  the  re¬ 
maining  sidelobes  are  uniformly  (and  progressively)  lower  than 
the  -30  dB  Dolph-Chebyshev  case  (p  =  0)with  the  last  sidelobe 
depressed  about  34  dB.  Similar  but  “reciprocal”  remarks  hold 
for  the  p  =  -0.2  case.  For  p  =  0.4  (p  =  -0.4)  the  second  side¬ 
lobe  is  slightly  higher  (lower)  than  -30  dB.  but  the  point  made 
here  is  still  substantially  true. 

The  weights  for  the  cases  p  =  0.4, 0.2,  andO.Oare  all  positive. 
For  the  cases  -0.2  and  -0.4,  the  only  negative  weights  corre¬ 
sponded  to  the  elements  adjacent  to  the  end  elements. 

All  five  examples  have  49  sidelobes  on  either  side  of  the  MRA. 
This  can  be  attributed  to  the  fact  that  the  Gegenbauer  poly¬ 
nomial  C%(x)  has  all  its  n  zeros  in  the  open  interval  (-1.  +1) 
when  p  >-  1/2.  Thus,  from  (20),  F{u)  must  have  A'-  )  =  99 
zeros  in  the  open  u  interval  (0,  2).  By  Rolle’s  theoren  of  ele¬ 
mentary  calculus.  FI//)  must  have  98  points  (i.e.,  sidelobe  peaks) 
interior  to  (0,  2)  w  here  |F  ’(ir)|  =  0.  Since  lf'(u)|  is  an  even  func- 


-74- 


114 


IF.KI  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  ASSI*  32,  NO  I .  FEBRUARY  IV84 


ANGLE  PROM  LOOK  DIRECTION  (DEG) 

lip.  1.  Gcgcnbaucr  100  element  array, n  =  0.4;  first  null  =  2. 565588  . 


I  ip  3.  Gcgcnbaucr  100  element  array;  u  -  0,  first  null  =  2  565588° 
(This  is  classic  Dolph-Chebyshev  ) 

lion  of  u,  half  of  these  sidelobes  must  be  on  each  side  of  the 
MRA 

All  five  examples  exhibit  a  plateau  in  the  decay,  or  growth, 
of  sidelobes  at  sufficiently  great  distances  from  the  MRA.  This 
feature  is  also  an  artifact  of  the  sidelobe  envelope  factor  (35). 

Taken  together,  these  examples  indicate  that  the  ratio  (39) 
is,  on  a  log  plot,  roughly  linear  in  p  for  fixed  n  and  beamwidlh 
parameter  z„ .  Whether  this  linearity  is  true  only  for  reasonably 


ANGLE  PROM  LOOK  DIRECTION  (DEG) 


Tip  5.  Gegcnbaucr  100  element  array; n  s  -0.4;  first  null  =  2.565588". 

small  values  of  p  has  not  been  determined.  A  careful  mathe¬ 
matical  proof  of  approximate  p  linearity  of  the  logarithm  of 
(39)  would  be  nice  to  have. 

V.  Discussion  and  Summary 
The  Gegenbauer  weighting  functions  for  the  discrete  and  con¬ 
tinuous  aperture,  as  well  as  for  nonrecursive  digital  filters,  per¬ 
mits  the  designer  to  maintain  a  fixed  specified  beamwidlh  as 


-75- 


SI  Kill:  TWO-PARAMLTEK  I  AMILY  Ol  WEIGHTS 


IIS 


defined  via  (30)  while  scanning  continuously  in  n  to  discrimi¬ 
nate  against  spatially  distributed  noise  sources  and  or  extra¬ 
neous  signals  by  tapering  the  sidelobes.  The  required  weights 
can  be  calculated  quickly  and  accurately  by  the  analytic  for¬ 
mulas  provided  here:  hence,  it  might  be  possible  to  choose  n 
adaptively  to  achieve  some  objective  such  as  maximizing  signal- 
to-noise  ratio.  The  beam  patterns  for  negative  p  are  particularly 
interesting  in  that  it  may  be  possible  to  discriminate  against 
noise  sources  that  lie  nearby  (in  bearing)  the  desired  signal 
source  and  thereby  enhance  tracking  capability. 

One  advantage  of  the  Gegenbauer  weights  is  that  they  are 
derived  for  a  discrete  aperture  exactly,  and  the  continuous  aper¬ 
ture  weighting  function  is  then  discovered  as  their  limit.  If  only 
a  continuous  aperture  function  is  defined,  then  it  must  be  sam¬ 
pled  at  a  finite  set  of  points  in  any  application  to  a  discrete 
aperture.  How  this  sampling  is  best  done  is  not  commonly  dis¬ 
cussed.  and  it  leaves  a  certain  ambiguity  in  the  discrete  aperture 
weights.  The  discrete  Gegenbauer  weights  given  by  (18)  and 
(Id)  above  do  not  have  this  problem. 

When  steering  a  Gegenbauer  array  design,  no  different  prob¬ 
lems  should  arise  than  what  is  normally  expected  in  the  usual 
Dolph-Chebyshev  design  Gegenbauer  designs  can  be  steered 
nearly  to  endfire  before  encountering  the  first  grating  lobe. 

A  difference  beam  pattern  can  be  constructed  from  the  Gegen¬ 
bauer  weights  in  the  usual  way  of  changing  the  signs  of  the 
weights  on  one-half  of  the  array.  If  this  is  done,  the  difference 
beam  pattern  is  proportional  to  lOf(z„  sin  (rru;'2))|.  This  is 
easy  to  show  from  the  constructions  (18H20)  The  result  is 
a  beam  pattern  with  a  null  at  u  =  0. 

All  the  nulls  of  the  Gegenbauer  beam  pattern  seem  to  shift 
strictly  away  from  the  MRA  as  p  increases.  This  effect  is  evi¬ 
dent  in  the  examples.  It  is  quite  possible  to  use  this  effect  to 
deliberately  control  null  placement  to  cancel  localized  noise 
sources.  A  mathematical  proof  that  the  nulls  must  shift  in  this 
manner  requires  knowledge  of  the  relative  size  of  the  derivatives 
(with  respect  to  p)  of  all  of  the  zeros  of  Cjj(x).  Although  this 
information  is  not  known  to  the  author,  it  is  not  really  necessary 
to  have  it  in  order  to  utilize  the  null  shifting  effect  in  practice. 

The  Gegenbauer  weights  for  discrete  and  continuous  apertures 
was  derived  by  the  author  between  March  and  May  1981.  The 
mathematical  results  contained  in  the  Appendix  first  appeared 
in  (11). 


For  the  case  of  real  p  and  X,  a  fourth  proof  is  given  here  that 
depends  in  an  essential  way  on  the  identity  ( A7).  In  this  con¬ 
nection.  the  particular  form  of  the  coefficientsak  n(  r)  is  impor¬ 
tant.  that  is.  the  easily  derived  identity  (AI0)  does  not  seem  to 
be  all  useful,  but  the  identity  (A8)  is  exactly  what  is  needed  It 
facilitates  the  investigation  of  the  limiting  form(A27)of  a*  „(y) 
as  n  tends  to  infinity.  The  identity  (A8)  is  apparently  new:  how¬ 
ever,  the  special  case  ofy  =  I  was  known  to  Gegenbauer. 

Equation  (A8)  is  interesting  in  another  regard  as  well.  A  sim¬ 
ple  inspection  suffices  to  prove  that  a*  „(.v)  >  0  for  all  n  and  k 
whenever  y  >  1  and p  >  X  >  0.  The  coefficients  remain  positive 
in  the  two  limiting  cases  p  >  0,  X  =  0  and  p  =  X  =  0.  as  can  be 
seen  from  (A18)-(A21 ).  In  fact,  it  was  only  this  positivity  re¬ 
sult  that  the  author  originally  sought. 

The  result  (A3)  of  the  Mehler-Heine  type  is  apparently  new. 
It  is  needed  to  prove  (Al)  by  our  methods.  It  has  additional 
interest  in  that  it  duplicates  the  result  given  by  Szego  ( A3)  simply 
by  setting  y  =  0.  Mathematically,  however,  (A2)  and  (A3)  are 
equivalent.  The  special  cases  (A4a) and (A4b) involving  Cheby- 
shev  polynomials  are  particularly  striking. 

Let  a  and  0  be  arbitrary  real  numbers.  For  any  complex  num¬ 
ber  x.  the  Mehler-Hetne  theorem  states  that 

lim  (cos-)=(x,;r°./aU)  (A2) 

.  \  m 

where  Ja(x)  is  the  Bessel  function  of  the  first  kind  of  order  a 
(1,  eq.  ( 1.7I.1)| .  (2,  sect.  3.1(8)] .  A  straightforward  proof  of 
( A2 )  can  be  found  in  Szego  ( 1 .  Theorem  8.1.1).  Szego’s  proof 
can  be  readily  modified  to  show  that 

/cos  -  \ 

hm  n'a  rS*'"! - —  ]  =  (i  v^7r%  (%/?rrj) 

V°Sf/ 

(A3) 

for  all  complex  jr  and  y.  Like  the  Mehler-Heine  result,  this  for¬ 
mula  holds  uniformly  for  x  and  y  in  every  bounded  region  of 
the  complex  plane.  The  special  case  a  =  0  =  - 1/2  gives  the  in¬ 
teresting  result 


Appendix 

Mathematical  Derivations  and  Results 
Sonme's  second  finite  integral  [8,  p.  376)  may  be  written 

r 

I  Ju{x  sin  6)JK(y  cos 8) sin"*  1  6cosx*‘  8d6 
-'o 

.  ■/„.*♦,  (VV  *yi)  (Al) 

T7?  +y3  )»’k" 

for  all  complex  x  and  y,  and  is  valid  when  both  Re(p)  >  - 1  and 
Re(X)>-l.  At  least  three  proofsof  this  result  are  known,  One 
involves  expanding  the  integral  in  powers  of  x  and  y.  another 
involves  integration  over  subsets  of  the  surface  of  the  unit  sphere 
m  R1 .  Both  are  given  in  (8) .  The  third  proof  using  the  gener¬ 
alized  Laguerre  polynomials  L(f\x)  is  mentioned  in  (12) . 


lim  T, 


-  cos  Vx*  -  yi 


(A4a) 


where  T„(x)  is  the  Chebyshev  polynomial  of  the  first  kind  [1, 
eq.  (4.1.7)) .  while  the  special  case  a  =  0  =  1/2  gives 


lim  n~'  U. 


Vx2  -  y1 


(A4b) 


where  V„{x)  is  the  Chebyshev  polynomial  of  the  second  kind 
(I,  eq.  (4.1.7)).  These  follow  from  (A3)  by  using  Stirling's 


-76- 


116 


IF.EE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING,  VOL  ASSP  J2,  NO.  I.  FEBRUARY  1964 


formula  and  the  well-known  results  ( 1 .  cq  (1.71.2)) 


COST.  Jl/2(z ) 


(  l\n 
[  —  )  sin  2. 
\nz/ 


We  will  need  another  special  case  of  the  general  result  ,  specifi¬ 
cally,  for  n  >  -  1 , 


1-2*  /cOS  — 

lim  -Z  —  Cg  l - - 

\  y 

\  cos  — 
\  n 


where  Cfilx)  are  the  ultrasphencal.  or  Gegenhauer.  polynomials 
[1,  eq.  (4.7. l)| .  (Szego  uses  the  notation  Pi,“\x)  instead  of 
CJf(JT».) 

We  derive  Sonine’s  second  finite  integral  by  finding  an  alter¬ 
nate  form  for  the  left-hand  side  of  (A6).  This  requires  the  fol¬ 
lowing  result.  For  *i  >  X  >0,  the  coefficients  a*  „(y)  >n  the 
expansion 


C(*.v)=  £  a*,n(y>Ci_u(x>,  "  =  01--  <A7) 

*-o 

are  given  explicitly  by 


*  _  In  -  2k  +  X)  y"~ lm 

y  Mr— m — vr";  - <AI0) 

m!  (*  -  III)!  , 

=  »•"■“  («-  2k  +  \)Qk(2v2  -  1)  (All) 

where  Qk  is  a  polynomial  defined  for  general  complex  argu¬ 
ment  u  by 

„  .  *  (-O'"  (<4-m _ (**  1\*"" 

k“  \  2  / 


For  arbitrary  a  and  0,  the  Jacobi  polynomial  of  degree  k>0 
can  be  written 


(k  +  a  +  0+  l)t.m  (/c  -  m  +  0+1)" 
m!  (k  -  m)! 


mk 


which  follows  from  ( I.  eq.  (4.21.2)]  using  the  identity  ( 1 ,  eq 
(4.1.3)] .  Setting  a  *  -  X  -  1  and  0  =  X  +  n  -  24  in  (A13) 
shows  that 

&(“)  =  rlf-— ~  ■■»•"- »»>(„)  (A14) 

(X)n-*.  i 

Expanding  the  Jacobi  polynomial  in  (AI4)  using  |1,  eq. 
(4.3.2)] 


fl*.nO')  =  (M  *  -k  +  M(P)„-k 


(l-to^d-tg). 


m!  (*  -  m)!  (1  +  n)m  (I  +  0)k. 


£  (P-  *  +  M)*-m(va  -  1  r/-1" 
m-o  m!  (*  -  m)!  (X)n_*-m,  , 


and  substituting  u  =  2y3  -  I  gives 


where  we  take  0°  =  1  and(0)o  =  1  whenever  they  occur.  Setting 
y  =  1  in  (8)  gives 

...  (»- 2*  +  X)(p)„_*  (p- X)* 

“*."(!> - 17777 -  (A9) 

which  is  due  to  Gegenbauer  [1,  eq.  (4.10.27)] .  Furthermore, 
for  real  y  >  1  and  it  >  X  >0.  the  coefficients  at„(y)  are  all 
positive  as  can  be  seen  by  inspection  in  (A8). 

The  formula  (A.8)  is  derived  as  follows.  Let  it  >  X  >  0.  In 
the  expression  (1,  eq.  (4.7.31)] 


Qk(2y3  -  l)  =  (u)n-k  Z 


(n-  X  +  m)k . m  ( y  -  1  fy 


2  _  I  yr»  2*-  jm 


(A16) 


I  n/2  j  ,  v 

Ctt(x)-  V  /_iyn  _ Wn-  m  (2rY' 

m-o  '  m!  (n  -  2m)l 


m!(*-m)!(X)„.  v  / 

Thus,  (A  1 6)  and  (All)  establish  (A8). 

Two  limiting  cases  of  (A7)  are  easily  derived  from  [1.  eq. 
(4.7.8)] 

lint  ~  C^(jr)=  T„(x),  n>  1  (A17) 

and  are  worth  recording.  Thus,  for  n  >  0, 


CS(xy)=  *».n(y)rn.,»(x),  n  =  0,1,2, 
*-  o 


we  replace  ar  with  xy,  substitute 


(2x)"' 3m  _  l(»-2m„2l  (n  -  2m  +  X  -  2s)  . 

(n  -  2m)!  Z  ,!  (X)B. 


.  O*  ♦»">*-«  O'1  -  l)my"-Jm 

o*,«i(^)  -  2(w„_i  y  ~r—  —  — - — 

m.o  m  (*  -  m)!  (n  -  *  -  m)! 


and  collect  terms  to  get 


(A19) 


STRUT:  two- i*ar  ami  ti.r  iamily  Ol  WC1CHTS 


117 


and 


T„(xy)=  T„.  n=1.2.  3. 

k-  o 

(A20) 

where 


Ck,n(v)  =  /i(n-  *  -  I)! 


I 


m  ■  o 


(m)k-m(>';  -  ir/'lm 

m'.  (X  -  m)\  (n  -  k  -  m)\ 


(A21) 

The  notation  1'  means  that  1/2  of  the  last  term  in  the  sum  is 
taken  if  n  is  even,  and  all  of  it  is  taken  if  n  is  odd.  Note  that 
inspection  shows  that  y  >  1  implies  that  bk  n(y)  and  ctn(y) 
are  positive. 

Sonine’s  second  finite  integral  is  now  derived  from  (A6). 
Fix  x  and  v.  LetA=[n/2).  From(A7) 


will  be  finite.  If  /  =  Inn  f„  and  g  =  hmg„.  the  bounded  conver¬ 
gence  theorem  p.  1 10)  implies 


-  f)#d  - 


(A24) 


Let  {  in  (0.  1 )  be  rational.  Then  1  -  f  =  2 kin  for  sufficiently 
large  k  and  n,  so  that 

*(l  -  f)  =  lim  g„(l  -  f) 


=  lim 
«•*« 


(n-  2 X)‘~u 
2X 


I  *  S'"  2k/n 


_ X  -  |  f2  (tj[) 

2*r(x  +  i)(fx)A*,/l 


(A2SJ 


with  the  last  step  following  immediate  from  (A6).  Thus,(A25) 
holds  for  all  {  in  [0,  I)  by  continuity.  Similarly,  from  (A8) 
and  for  all  f  rational  in  (0.  1 ). 


/(I  -  t)=  lim  /„(!  -  £) 

rt  —  — 


lim 
n~*  •- 
t  -f*2 kftt 


Xn1  ~  2|i(l  +  A) 
Pin  -  2 k)'~lK 


«*.» 


{■1'^cos"-m!(*-m)!(X+  !)„.». m 
n 


(A26) 


where  we  have  defined  for  0  <  f  <  1 


■*-  u{n  -  2*) 1 


*-  o 


*.<  i  -  n  ■  £  «- » («  i)  x«.  (i  - » 

*•0  \  fl  / 


and  is  the  characteristic  (indicator)  function  of  the  interval 


A* 


T  *  Hi 
LA  +  1  '  A  +  I 

r  *  hi 

La  + 1  ’  a  + 1 


-)• 

] 


*»0,  !,•••,  A-/ 


X  =  A. 


It  can  be  verified  that  X£*  (2 X/n)  =  1  for  X  =  0,  1 .  •  •  •  ,  A. 

Assume  for  the  moment  that  both  and  lgn(f)|  are 

bounded  above  by  integrable  functions  of  f.  To  do  this,  it 
will  be  seen  that  we  must  restrict  attention  to  X  >  - 1  /2 ,  ji  > 
-  1/2,  u  >  X,  so  that  the  integral  ( 1.  eq.  (1.7.4)] 


f 


r^d-r1) 


-  frl A- t 


d!  = 


nx  +  ^)T(p-  X) 


2f(ga  +  i) 


(A23) 


Interchange  the  limit  and  the  summation,  and  evaluate  the  limit 
of  the  mth  term  (convert  Pochhammei  symbols  to  gamma  func¬ 
tions,  apply  Stirling's  formula,  and  use  X(n  *  X)  =  ( 1  -  f 1  )n2  /4) 
to  obtain 


A  i 


v  HHii 

ixm+d 

(TvVTTF)3m 

m!  IXf-  X  +  m) 

r(X+  q 

r(p+i) 

Vx-  ,  (WT  -  f3) 


(A27) 


where  /„(z)  denotes  the  modified  Bessel  function  of  the  first 
order  t>  (see  (8,  sect.  3.7(2)J ).  We  must  require  /a  >  X  in  (A27) 
to  have  convergence.  Continuity  again  assures  that  (A27)  holds 
for  all  f  in  (0, 1 ).  Now,  interchanging  the  limit  and  the  sum  was 
valid  because  an  upper  bound  for  the  total  sum  can  be  found. 
Since  the  absolute  value  of  the  mth  term  in  (A26)  is  bounded 
by 

0  HX+  I)  gb'lVT-C2)3’" 

P(P+  1)  m!  P(P-  X  +  m) 


-78- 


118 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  ASSP-32.  NO  I.  EEBRUARY  1984 


where 


B  =  - 


n  n 


-Uu-x-i) 


•  I  -  2\ 


n2m  |  cos" 

n 


-m 


V(k  +  m  -  X)  r \rt  -  k  +  u) 

l\k  -  m*l)  Rn  -fr  +  X+  l-  m) 


the  total  sum  in  (A26)  is  bounded  by 


RX  +  l ) 

rut+ 1) 


y  flwvQV; 

m-  o  m!  R/i  -  X  +  m) 


(A28) 


for  some  constant  1.  independent  of  f.  The  series  in  (A28)  is 
a  continuous  function  of  f  on  [0.  I]  if  p>X.  Hence,  from 
(A23).  F(()  is  an  integrable  function  that  hounds  |/„(f)|  for 
all  n. 

From  (A24).  (A25).  and  (A27)  we  have 


t/ir/2 

i\»  +  l)xK-'/\vu-: 

I 

fX*l/2  (Vl  -  f2)" 


-X-  1 


■A- i/i  .  (J’VHI)4 


Vn/2 

2*T(<i+  I) 


A- 1<2  (V**  -y2) 
(V*:  -  y" 1/3 


(A29) 


with  the  last  equation  from  (A6).  Substituting  f  =  sin  9  and 
y  =  iy '  in  the  last  two  formulas,  and  setting 

(/  =  X  -  j  >  -  1  and  X*  =  /i  -  X  -  1  >-l  (A30) 

yields  Sonine’s  second  finite  integral  (Al).  The  only  thing  left 
to  prove  is  that  |g„(f  )|  is  bounded  by  an  integrable  function  on 
[0,  1).  Szego's  argument  [l.p.  192]  in  the  proof  of  (A2)  can 
be  modified  easily  to  show  |g„(f)i  is  bounded  by  a  constant. 

The  proof  of  (Al )  presented  here  was  intentionally  restricted 
to  real  u  and  X.  However,  it  is  not  hard  to  see  from  (A23)  and 


( A30 >  that  the  proof  can  be  carried  out  for  complex  p  and  X. 
provided  appropriate  remarks  are  made  in  appropriate  places 
about  the  complex  case.  If  such  remarks  are  made,  our  deri¬ 
vation  proves  (Al  I  for  Rc(p)  >  -  1  and  Re(X)  >  -  1 .  Divergence 
of  ( A23)  is  seen  lo  be  the  cause  of  the  restrictions  on  p  and  X. 
The  material  contained  in  this  Appendix  was  first  documented 

in  1 1 1  ] 


References 

|1)  G.  Szego.  Orthogonal  Polynomials,  4th  ed.  Amcr.  Math.  Soc. 
Colloquium  Pub.,  vol.  23.  1978. 

1 2 1  C  L.  Dolph,  “A  current  distribution  for  broadside  arrays  which  op¬ 
timizes  the  relationship  between  beam  width  and  side-lobe  level.'* 
Proc.  IRE  and  Waves  and  Electrons,  vol.  34,  pp.  335-348,  June 
1946 

|3)  G.  J.  van  der  Maas.  “A  simplified  calculation  for  Dolph-Tcheby- 
chefl  arrays,”  /  Appl.  Phys. ,  vol.  25.  Jan.  1954. 

(4)  J.  1  .  Kaiser,  "Digital  niters,”  in  System  Analysis  by  Digital  Com¬ 
puter.  F.  F.  Kuo  and  J.  I .  Kaiser,  Eds.  New  York:  Wiley.  1966,. 
pp.  218-285. 

[5j  T.  T.  Taylor.  "Design  of  line-source  antennas  for  narrow  beam- 
width  and  low  side  lobes,”  IRE  Trans.  Antennas  Propagat.,  vol. 
AP-3.  pp  16-28.  Jan.  1955. 

[6]  R.  J.  Stcgun,  “Excitation  coefficients  and  bcamwidihs  of  Tscheby- 
schcff  arrays."  Proc.  IRE.  vol.  41.  pp.  1671-1674.  Nov.  1953. 

[7]  A  II  Nuttdll.  private  communication.  Naval  Underwater  Syst. 
Ccn..  New  London.  CT.  Apr.  13.  1982. 

[8 1  G.  N.  Watson.  Theory  of  Bessel  Functions.  2nd  ed  New  York: 
Cambridge  Univ.  Press,  1966. 

[9J  P  R.  Ilalinos.  Measure  Theory  Princeton.  NJ  Van  Nostrand. 
1950 

[101  G.  A  Campbell  and  R  M  Poster.  Fourier  Transforms  Jor  Practi¬ 
cal  Applications  Princeton,  NJ  Van  Nostrand.  1948 
1 1 1 1  R.  L.  Streit.  "An  expansion  of  the  Gcpenbaucr  polynomial 
Cjji.vr)."  Nl'SC  Tech  Rep.  6579.  Naval  Underwater  Syst.  Cen., 
New  London.  CT.  Mar.  25,  1982. 

[12|  R  Askex  and  J  I  itch.  “Integral  representations  for  Jacobi  poly¬ 
nomials  and  some  applications J  Math  Anal  Appl.  vol  26. 
PP  411-437.  1969 


Roy  L.  Streit  was  born  in  Cuthrie.OK.onOcto- 
ber  14,  1947.  He  received  the  B. A.  degree  (with 
honors)  in  mathematics  and  physics  from  East 
Texas  State  University,  Commerce,  in  1968,  the 
M.A.  degree  in  mathematics  from  the  University 
of  Missouri.  Columbia,  in  1970,  and  the  Ph.D. 
degree  in  mathematics  from  the  University  of 
Rhode  Island,  Kingston,  in  1978. 

He  is  currently  an  Adjunct  Assistant  Professor 
for  the  Department  of  Mathematics,  University 
of  Rhode  Island.  He  was  a  Visiting  Scholar  in 
the  Department  of  Operations  Research,  Stanford  University,  Stanford, 
CA,  during  1981-1982.  He  joined  the  staff  of  the  Naval  Underwater 
Systems  Center,  New  London,  CT  (then  the  Underwater  Sound  Labora¬ 
tory),  in  1970.  He  is  an  applied  mathematician  and  has  published  work 
in  several  areas,  including  antenna  design,  complex  function  approxi¬ 
mation  theory  and  methods,  and  semi-infinite  programming.  He  is  cur¬ 
rently  conducting  research  sponsored  by  the  Office  of  Naval  Research 
in  nonlinear  optimization  methods  for  acoustic  array  design. 


-79- 


Orthogonal  Polynomial  Based  Array  Design 
R.  L.  Streit 


81- 


Abstract 


Array  weighting  designs  of  the  Dolph-Chebyshev  and 
Kaiser-Bessel  type  are  based  mathematically  on  orthogonal 
polynomials.  The  theoretical  properties  of  these  polynomials  give 
rise  to  the  desirable  properties  of  the  resulting  arrays.  This  paper 
presents  results  for  array  weights  based  on  a  very  general  set  of 
orthogonal  polynomials  called  the  Jacobi  polynomials.  Many 
interesting  array  far-field  beampattems  are  exhibited.  A  practical 
means  of  computing  aU  the  array  weights  exactly  by  means  of  one 
fast  Fourier  transform  (FFT)  is  given.  This  method  is  quick  and 
accurate  and  can  compute  the  weights  for  arrays  having  large 
numbers  of  elements.  It  can  efficiently  compute  both 
Dolph-Chebyshev  and  discrete  Kaiser-Bessel  weights  as  special 
cases. 


-83- 


TM  851015 


I.  INTRODUCTION 

Array  weights  of  the  Dolph-Chebyshev  and  Kai  ser-Bessel  type  are  based  on 
orthogonal  polynomials.  The  desirable  properties  of  these  arrays  are  due 
entirely  to  the  properties  of  the  underlying  orthogonal  polynomials.  In  this 
paper  the  mathematical  design  methodology  developed  in  [1],  which  parallels 
the  techniques  of  the  Oolph-Chebyshev  designs,  is  further  explored  for  Jacobi 
polynomi  als. 

Analytical  expressions  for  the  underlying  weights  are  not  sought  here,  as 
they  are  in  [1],  because  such  expressions  are  probably  not  competitive  with 
the  exact  FFT  based  method  presented  in  this  paper.  The  primary  purpose  of 
this  paper  is  to  present  a  unified  FFT  based  method  for  computing  array 
weights  based  on  any  of  the  Jacobi  polynomials.  This  method  is  quick  and 
accurate  and  can  compute  the  weights  for  arrays  having  large  numbers  of 
elements.  The  Appendix  gives  a  short  Fortran  program  for  computing  the 
weights  (given  a  subroutine  for  the  FFT).  This  program  can  efficiently 
compute  both  Dolph-Chebyshev  and  discrete  Kai  ser-Bessel  weights  as  special 
cases. 

This  paper  represents,  in  a  sense,  a  completion  of  certain  ideas  about 
array  design  using  orthogonal  polynomials.  Dolph  was  apparently  the  first  to 
use  an  orthogonal  polynomial  for  designing  an  array  weighting  functi  on.  The 
polynomial  he  used  was  the  Chebyshev  polynomial  of  the  first  kind,  Tn(x), 
and  he  was  able  to  prove  an  optimality  condi ti  on.  Unfortunately  his  proof  of 
this  optimality  condi  ti  on  relies  on  the  unique  behavi  or  of  the  graph  of 
Tn(x),  and  so  generalizati  cns  of  the  optimality  condi  ti  on  seem  unlikely. 
Nonetheless,  the  use  of  different  orthogoial  polynomials  can  lead*Bseful 
weighting  functions.  For  example,  if  the  Chebyshev  polynomial  of  the  second 
kind,  IV, (x),  is  used  in  place  of  Tn(x)  a  weighting  functi  on  of 
Kai  ser-Bessel  type  results  (see  [1]).  It  is  also  possible  to  use  a  general 
family  of  orthogonal  polynomials  that  contains  both  Tn(x)  and  Un(x)  as 
special  cases.  The  Gegenbauer  polynomials,  C^J(X),  are  one  such  family; 

that  is,  Tn(x)  and  Un(x)  are  the  special  cases  u  .  0  and  »  .  1, 
respectively.  This  family  is  used  in  [1]  and  gives  useful  and  interesting 
designs.  The  most  general  of  the  so-called  classical  orthogonal  polyncminals 
that  contains  all  these  examples  as  special  cases  is  the  family  of  Jacobi 
polynomials,  Pnla’  B'(x).  For  a  -  e  »  u  -  1/2  they  reduce  to  Cjj(x). 

Do  Jacobi  polynomials  turn  out  to  be  useful?  The  examples  presented  in  this 
paper  indicate  that,  although  many  interesting  new  array  designs  are  possible 
using  the  Jacobi  polyncminals,  the  most  useful  designs  in  this  general  family 
are  probably  those  that  have  already  been  discussed.  Consequently  the  new 
Jacobi  designs  might  be  said  to  be,  at  present,  "a  soluti  on  looking  for  a 
problem. " 


5 


-85- 


TM  851015 

II.  WEIGHT  GENERATION  BY  FFT 

The  far-field  beampattern  of  a  general  linear  beamformer  for  a  linear 
equispaced  array  having  2N  +  1  elements  (N  >  1)  with  element  positions  x^  » 
k  x/(vD),  k  -  0,+  1,  ...,+  N,  is  given  by: 


N 

F(u)  -  T  w.  exp(-i2*  ku/(vD))  (1) 

k  -  -  N 


where  the  integer  D  _>  1  is  given  and  v  >  0  is  a  fixed  real  constant  (vD  is  the 
number  of  elements  per  wavelength),  ;  is  the  wavelength  of  the  design 
frequency,  and  u  is  defined  by 


u  -  sin  ea  -  sin 


(2) 


where  os,  -*/2  <_  <  »/ 2,  is  the  steering  (look)  angle  and  oa,  -  »/ 2  < 

»a  <  */ 2,  is  the  arrival  angle  of  a  plane  wave.  Both  angles  are  measured 
from  a  line  normal  to  the  array  axis.  The  weights  {wkj  can  be,  in  general, 
any  set  of  complex  constants. 

Define  the  functions 


tD(z)  -  L  ak  zk  (3) 

k  «  -D 

n  . 

Pn(z)  -  I  ,  n  >  0,  (4) 

"  k  -  0  K 


where  {a^j and)bk} are  specified  constants.  By  simple  algebra 


n0  k 

W2.**  *  .Z  V  (5) 

k  »  -no 


where  jc^}  depend  on  both  |a|<l  andjb|<| .  Substituting  z  *  exp(-i*u/(vD)) 
gives 


H(u)  -  Pn(tD(exp(-i«u/(vO))))  (6) 

nD 

-  E  cv  exp(-i2«ku/(2vD)).  (7) 

k  -  -nO  * 


6 


-86- 


TM  851015 


By  comparing  (7)  with  (1),  it  is  seen  that  H(u)  is  the  far -field  beampattern 
of  a  linear  array  with  2nD  +  1  elements,  equispaced  x/(2vD)  apart,  and  with 
the  weight  C|<  applied  to  the  k-th  element.  It  will  be  shown  that  the  array 
weights  C|<  can  be  computed  from  function  values  of  tQ  and  Pn  by  means  of  an 


When  some  of  the  weights  q<  =  0,  the  corresponding  elements  may  be 
eliminated  from  the  physical  array  without  altering  the  array's  far -field 
beampattern.  Elimination  of  zero  weighted  elements  (whenever  possible) 
minimizes  the  total  number  of  elements  required.  This  consideration  is 
especially  important  when  tg  and  Pn  are  chosen  so  that  Pn(tD(z))  is 
either  even  or  odd  in  z,  for  then  about  half  the  elements  need  not  be 
physically  present.  This  is  discussed  further  in  section  III. 

We  now  show  that  the  weights  jc|<|  in  (7)  can  be  computed  exactly  from 
(3)  and  (4)  by  the  fast  Fourier  transform  (FFT).  It  is  stressed  that  this 
procedure  is  theoretically  exact,  not  approximate,  for  the  . 

Let  f (u)  be  any  complex  valued  function  of  a  real  variable  u  which  can  be 
written  exactly  in  the  form 


P 

f(u)  =  £ 

k  =*  -P 


dk  e 


-iku 


(8) 


for  some  complex  constants  |d|<J  .  Let 


1 


2P  -  1 


fk‘2p  .^0  Fj  el2*kj/{2p),  k  -  0,  1,  ....  2p  -  1,  (9) 


where 


Fj  =  (-l)J  f ( (p-j )w/p)  ,  j  -  0,  1,  ....  2p  -  1. 


(10) 


Thus  jffc}  is  the  inverse  FFT  of  order  2p  of  the  sequence  {Fj}.  Substitute 
(8)  into  (10),  and  then  into  (9)  to  get 


fk  -i 

2p 


1 

2p 


2p  -  1 

£ 

j  =  o 


(-DJ 


p 

£ 

q  -  -P 


d  e 

q 


-i  wq(p-j)/P 


z  <-1>q  d, 

q  =  -p  H 


iwkj/p 


2p£  1  ei w(q  +  k)j/p 
j  =  0 

k«0,l,...,2p-l. 


-87- 


TM  851015 


The  inner  sum  equals  2p  when  q  +  k  -  +  p,  +  3p,  ...  and  equals  2ero 
otherwise.  Since  dq  »  0  for  jq|  >  p,~ 


fk  -  r 


(-DP  W. n  +  <U* 


-P  P' 


(-l)p4<  d 


P-k 


if  k  m  0 

k  ■  !•  ...|  2p-l • 


(11. a) 
(ll.b) 


Now  let  f (u )  *  H(v0u/it),  where  H(u)  is  given  by  (7),  and  p  «  nD.  Then 
Fj  -  (-l)J  Pn(tp(e-Mn0  -  J)/nD)),  j  -  0,  1,  ....  2n0-l, 


and  the  coefficients  c^  in 

(7)  are  just 

ft  0 

,  if  k  -  -nD, 

(12. a) 

ck  "  j 

(-l)k_n0 

fnD-k  *  k  »  -nD+1,  ....  nD-1 

(12. b ) 

<aD  bn 

,  if  k  «  nD, 

(12. c) 

where  {f^11 0-1  is  the  inverse  FFT  of{F^nD_1.  The  coefficients 

c^p  and  cnD  cannot  be  computed  directly  by  FFT  because  of  aliasing,  as 

indicated  in  (11. a);  however,  by  direct  appeal  to  the  defining  equations  (3)  - 
(6),  it  follows  that  c_nQ  and  cnp  are  as  given  by  (12. a)  and  (12. c), 
respectively.  In  fact,  (11. a)  can  be  used  as  a  check  on  numerical  accuracy  in 
the  computations. 


It  should  be  obvious  that  some  special  forms  of  to  and  Pn  can  be  used 
advantageously  to  reduce  the  size  of  the  FFT  required  to  compute  (12).  In 
general,  however,  no  special  structure  exists  and  the  smallest  FFT  size  that 
can  be  used  has  order  2n0. 


In  cases  where  2nD  Is  not  an  Integer  power  of  two,  the  FFT  is  still 
applicable  by  zero  filling  in  tp  and  Pn.  That  is,  D  is  replaced  by  the 
smallest  power  of  two  which  exceeds  or  equals  D,  say  O'.  The  function  tp  is 
then  merely  considered  to  be  a  special  case  of  tp*.  Similarly,  Pn  is  a 
specia1  case  of  Pn*  for  some  smallest  power  of  two,  n',  which  is  greater 
than  or  equal  to  n.  The  required  size  of  the  FFT  is  thus  2n'D'.  The 

coefficients  { are  still  given  by  (12);  however,  from  (7),  it  must  be 
the  c Sr;  that  c^  -  0  forjk  |>  nD. 

The  Appendix  gives  a  Fortran  subroutine  for  computing  the  array  weights 
by  an  FFT,  given  subroutines  for  evaluating  Pn(z)  and  tp(z).  The  program 
is  specialized  to  the  case  D  -  1,  but  it  can  be  easily  altered  to  accommodate 
larger  values  of  D.  The  program  is  also  written  on  the  assumption  that  2nD  is 
a  power  of  two.  As  discussed  in  the  preceding  paragraph,  zero  filling  allows 
the  most  general  situation  to  go  through. 


8 


-88- 


TM  851015 


III.  THE  TEN  PARAMETER  JACOBI  WEIGHT  FAMILY 

The  most  important  special  case  of  H(u)  seems  to  be  D  =  1.  Although 
there  may  be  interesting  possibilities  when  D  >  1  (consider,  for  example,  the 
identity  Tn(To(cos  »))  -  Tn+n(cos  e),  where  Tn  is  the  Chebyshev 
polynomial  of  the  first  kindj,  these  cases  are  not  explored  in  this  paper. 

Before  proceeding,  it  is  instructive  to  see  first  how  the  usual 
Dolph-Chebyshev  case  for  half  wavelength  equispaced  arrays  is  derived  as  a 
special  case  of  H(u).  Let 

N  m  number  of  array  elements 
0  .  1 

*:Lx 

ti(z)  =  Z0  (z-1  ♦  z)/2 
Ppf(z)  -  Tf>f (z ) , 

where  Tfj(z)  is  the  Chebyshev  polynomial  of  the  first  kind  and  the  real 
constant  Zq  is  given  by 


zo  ■  ?{(Q  ♦  (Q2-l)  1/2)1/ff  ♦  (Q  -  (Q2  -l)1/2)1/frJ  (15) 


where 

Q  -  10|S|/20 

S  »  specified  sidelobe  level  (in  dB). 

For  these  values  it  follows  that  ti(exp(-i*u/2)) »  Zo  cos(*u/2)  and,  from 
(6)  and  (7), 


H(u)  -  Tr(  Zq  cos(wu/c))  (16) 


n 

-  £  cl£  exp(-i wku/2).  (17) 

k  -  41  k 


By  comparing  (17)  to  (1),  it  would  appear  at  first  glance  that  this  array  is 
quarter  wavelength  equispaced  with  2ff  +  1  »  2R  -  1  elements.  However,  every 
other  coefficient  in  (17)  is  identically  zero  because  Tff(z)  is  always  either 
an  even  or  an  odd  function  in  z.  Deleting  the  zero  weighted  elements  reduces 
(17)  to  two  slightly  different  cases,  depending  only  on  whether  R  is  e^en  or 
odd.  These  cases  are  not  given  explicitly  here.  Note,  however,  that  N  even 
implies  that  ff  is  odd  and  that  Tff(z)  contains  no  even  powers  of  z,  which 
means  that  Tff(z)  has  (ff  +  l)/2  noji-zero  coefficients  and,  consequently,  that 
Tjl(tj(z))  has  precisely  ff  ♦  1  =  N  non-zero  terms  In  the  expansion  (17). 
Simitar  reasoning  holds  for  It  odd. 


9 


-89- 


TM  851015 

To  sunmarize:  The  Dolph-Chebyshev  array  design  is  thought  of,  in  the 
context  of  this  paper,  as  a  quarter  wavelength  equispaced  array  in  which  half 
the  elements  (every  other  one)  has  been  zero  weighted.  Dolph-Chebyshev  arrays 
of  both  even  and  odd  numbers  of  elements  are  thought  of  in  this  way. 

From  (3),  the  most  general  form  for  D  -  1  is 

ti(z)  -  a_i  z-1  +  ao  +  ai  z. 

For  reasons  that  will  become  clear,  the  slightly  more  restrictive  form 


t-1  (z )  -  ZqO”"1  Z_1  +  a0  +  r0  z*/2,  r  *  °»  (18) 


is  adopted,  where  zq,  rg,  and  ao  are  arbitrary  complex  constants.  The 
only  useful  form  excluded  by  (18)  is  obtained  by  changing  the  sign  of  the 
highest  order  term;  this  latter  form  corresponds  to  "difference"  patterns  and, 
for  ease  of  exposition,  is  not  discussed  further.  For  the  remainder  of  the 
paper,  (18)  is  taken  as  the  definition  of  tj(z).  Note  that  ti(e_i&)  = 

Z0  cos  »  if  ro  “  1  and  ag  -  0. 

The  most  general  class  of  polynomials  Pn(z)  considered  in  this  paper  are 
the  Jacobi  polynomials,  denoted  P^*«8)(z).  They  are  defined  explicitly  by 


-  sf  Z  0(;)<nwl>k  (•*♦!  )„JI(Hi)k  (19) 


for  all  complex  values  of  a,  0,  and  z.  For  a  >  -1  and  0  >  -1,  the  Jacobi 
polynomials  are  orthogonal  on  the  real  interval  -1  <z  <_*  1,  but  they  are  not 
necessarily  orthogonal  for  other  values  of  a  and  0.  The  best  available  method 
for  computing  Pn(a,0)(z)  relies  not  on  (19)  but  on  the  three  term 
recursion  which  they  satisfy.  The  published  algorithm  [2],  [3]  based  on  this 
recursion  is  easily  modified  to  compute  the  Jacobi  polynomials  for  complex  <*, 
0,  and  z  for  all  values  of  the  degree  n  that  are  likely  to  be  of  practical 
interest,  say  n  <  150.  A  thorough  mathematical  treatment  of  Pn(«»0)(z) 
is  available  in  £4]. 

The  generalized  Laguerre  and  Hermite  polynomials,  together  with  the 
Jacobi  polynomials,  constitute  a  complete  list  of  the  so-called  classical 
orthogonal  polynomials.  Array  designs  can  also  be  based  on  the  Laguerre  and 
Hermite  polynomials  and  will,  of  course,  be  different  from  those  based  on 
Jacobi's.  Although  these  designs  are  probably  interesting,  in  this  paper 
attention  Is  restricted  to  the  Jacobi  polynomials. 

The  use  of  (18)  and  (19)  giveyrlse  to  a  five  parameter  family  of  weights 
which  includes  nearly  all  the  well  known  analytic  families  of  weights  as 
special  cases.  (The  most  prominent  exception  is  Taylor  weighting).  The  five 
parameters  are  zn,  rn,  ao,  a,  and  0.  Each  parameter  can  be  complex,  so 
there  are  actually  lu  real  parameters  if  the  real  and  imaginary  parts  of  each 
are  counted  separately.  The  Fortran  program  listed  in  the  Appendix  is  written 
for  this  general  case. 

10 


-90- 


TM  851015 


The  simplest  way  to  explore  the  properties  of  these  ten  parameters  is  to 
perturb  each  parameter  separately  while  holding  the  others  fixed  at  some  nominal 
value.  The  nominal  parameter  values  chosen  here  for  the  examples  are  those  that 
give  rise  to  the  Oolph-Chebyshev  design  for  an  array  of  33  elements  with  a 
sidelobe  level  of  -30  dB.  Specifically, 

ao  -  -.50  +  0.0  i 
60  ■  -.50  +  0.0  i 
ao  -  0.0  +  0.0  i 

ro  -  1.0  +  0.0  i  (20) 

zo  -  Zo  -  1.00B408  +  0.0  i, 

where  Zq  is  computed  from  (15)  with  S  -  -30  dB  and  n  -  32.  Recalling  earlier 
remarks  in  this  section  concerning  the  interpretation  of  half  wavelength 
equispaced  arrays  as  quarterwavelength  equispaced  arrays  with  zero  weights,  set 
n  -  32  in  (6)  and  (7).  Thus,  In  principle,  the  example  is  a  2n  +  1  -  65  element 
quarterwavelength  equispaced  array.  The  coefficients  let,]  are  computed  from 
(12).  In  (12. a)  and  (12.c),  note  that  the  Identity  [4,  Eq.  (4.21.6)] 

b„  .2-°  .  n»0.  (21) 

holds  and  so,  from  (18), 


a0  *  2  z0r0*  a-D  *  I  z0r0  1 


Each  of  the  ten  parameters  is  both  Increased  as  well  as  decreased  from 
its  nominal  value  given  in  (20).  Thus  there  are  21  cases,  including  the 
nominal  case  Itself  in  (20).  Table  1  displays  these  cases  and  gives  each  an 
identifying  case  name.  To  each  case  In  Table  1  there  corresponds  a  graph  of 
the  far -field  beampattern  H(u)  on  the  u-interval  [0,4]  and  two  bar  charts,  one 
for  the  real  part  and  one  for  the  Imaginary  part  of  the  weights  corresponding 
to  that  case. 


Figure 

Number  Case  Names  Value  of  Perturbed  Parameter 

1  N0M  no  deviations  from  (20) 

2  Z.l  Z.2  Z.3  Z.4  zo+.003  Z(j-.003  zo+-003i  zo-.003i 

3  A.l  A. 2  A.3  A.4  ao+.003  ao-.003  ao+.003i  ao~.003i 

4  R.l  R.2  R.3  R.4  ro+.03  ro-.03  ro+.03i  ro-.03i 

5  a.l  a. 2  a.3  a.4  ao+.3  a0-*3  a0+»3i  a()-»3i 

6  0.1  0.2  0.3  0.4  0O+.3  0 q— .3  0o+.3i  0q— .31 

Table  1  Perturbed  parameter  values;  deviation  from  the  nominal  values  (20) 

In  general,  H(u)  is  periodic  with  a  period  of  length  2vD.  Since  0-1 
and  v  -  2  in  these  examples,  any  Interval  of  length  4  suffices  to  exhibit  all 
the  structure  of  H(u). 


11 


-91- 


TM  851015 

In  all  the  bar  charts  presented,  upward  lines  indicate  positive  weights 
and  length  is  proportional  to  magnitude.  Similarly,  downward  lines  indicate 
negative  weights.  These  upward  and  downward  lines  are  ordered  from  left  to 
right  and  correspond  to  elements  numbered  from  -32  to  +32.  Any  element 
receiving  a  zero  weight  is  indicated  by  a  simple  "x"  marking  its  position.  In 
particular,  notice  that  the  nominal  case,  NOM,  of  Figure  1  has  only  33 
non-zero  weights.  The  nominal  case,  as  has  been  said,  is  a  half  wavelength 
equispaced  array  being  treated  as  a  quarter  wavelength  equispaced  array.  The 
weights  in  each  case  are  normalized  by  the  largest  magnitude  of  any  real  or 
imaginary  part;  thus,  the  normalization  between  cases  is  not  exactly  the  same. 

Figure  1  is  the  reference  case  (20)  and  needs  no  further  comment. 

Figure  2  perturbs  only  zo.  The  cases  Z.l  and  Z.2  are  expected  since 
zq  is  merely  increased  or  decreased  in  its  real  part  alone.  Wien  zo  is 
perturbed  by  adding  an  imaginary  component,  the  array  still  has  33  non-zero 
weights  and  so  is,  in  effect,  half  wavelength  equispaced.  It  is  surprising 
how  much  can  be  added  to  the  imaginary  parts  of  the  weights  without  seriously 
degrading  the  beampattern.  The  beampatterns  in  Z.3  and  Z.4  are  identical. 

Figure  3  perturbs  ag  from  its  nominal  zero  value.  Any  perturbation 
produces  a  quarter  wavelength  equispaced  array  which  is  symmetrically 
weighted.  Oie  way  to  discuss  the  results  is  to  visualize  the  65  element  array 
as  being  composed  of  two  half  wavelength  equispaced  arrays — one  having  33 
elements  and  the  other  32  elements  with  the  elements  of  the  two  arrays 
interlaced.  Thus,  perturbing  the  real  part  of  ao  is  equivalent  to  adding  or 
subtracting  the  outputs  of  these  two  arrays.  Perturbing  the  imaginary  part  of 
ao  is  equivalent  to  adding  or  subtracting  the  outputs  after  first  putting 
them  in  phase  quadrature  with  respect  to  each  other.  The  beampatterns  A. 3  and 
A. 4  are  identical  to  each  other,  but  they  are  NOT  the  same  as  Z.3  and  Z.4. 

Figure  4  perturbs  rg  from  its  nominal  value  of  +1.  Any  small 
perturbation  produces  asymmetrically  weighted  half  wavelength  equispaced 
arrays.  Real  perturbations  of  rg  produce  only  real  weights  and  have 
beampatterns  without  any  true  nulls.  Pure  imaginary  perturbations  do  not 
alter  the  beampattern  from  its  nominal  case,  even  though  the  weights  develop 
an  interesting  sinusoidal  character  in  their  imaginary  parts. 

Figure  5  perturbs  a  from  its  nominal  value  of  «g  -  -1/2.  Any 
perturbation  produces  a  quarter  wavelength  equispaced  array  which  is 
symmetrical ly  weighted.  Real  perturbations  yield  real  weights  while  pure 
imaginary  perturbations  yield  complex  weights.  The  first  grating  lobe  in  case 
«.l  is  at  about  -8  dB  instead  of  0  dB;  the  same  is  true  of  the  MRA  in  case  a. 2. 

Figure  6  perturbs  s  from  its  nominal  value  of  bq  «  -1/2.  Any 

perturbation  produces  quarter  wavelength  equispaced  arrays  which  are 
symmetrically  weighted.  Real  perturbations  yield  real  weights,  while  pure 
imaginary  perturbations  yield  complex  weights.  The  first  grating  lobe  in  case 
b.2  is  suppressed  to  about  -8  dB,  while  in  case  B.l  the  MRA  is  depressed  to 

-8  dB.  Figure  6  and  Figure  5  should  be  compared. 


12 


-92- 


TM  851015 


The  following  observations  seem  to  hold: 

1.  The  real  part  of  a  controls  the  "upper"  envelope  of  H(u)  near  the 
center  peak . 

2.  The  absolute  value  of  the  imaginary  part  of  a  controls  the  "lower" 
envelope  of  H(u)  near  the  center  peak. 

3.  The  real  part  of  e  controls  the  “upper"  envelope  of  H(u)  near  the 
first  grating  lobe. 

4.  The  absolute  value  of  the  imaginary  part  of  b  controls  the  "lower" 
envelope  of  H(u)  near  the  first  grating  lobe. 

Other  parameters  (the  imaginary  parts  of  a  and  zq)  also  affect  the  "lower" 
envelope  of  H(u),  but  the  dominant  effects  seem  to  be  due  to  the  imaginary 
parts  of  a  and  e.  The  imaginary  part  of  r  does  not  affect  the  lower  envelope 
at  al 1 . 

By  changing  the  parameters  simultaneously  in  different  ways,  the 
different  effects  may  be  combined,  at  least  for  small  perturbations.  Examples 
of  this  are  not  included  here. 


13 


-93- 


TM  851015 

IV.  SUMMARY  ANO  CONCLUDING  REMARKS 

It  has  been  shown  that  array  weights  based  on  the  Jacobi  orthogonal 
polynomials  can  be  computed  exactly  by  means  of  FFT.  As  a  special  case, 
weights  based  on  the  Gegenbauer  polynomials  can  also  be  computed  exactly  by 
FFT,  instead  of  analytically  as  in  [1].  Examples  have  been  presented  to  show 
the  effects  of  varying  the  ten  parameters  in  the  Jacobi  family. 

Further  work  in  this  area  is  possible.  In  addition  to  the  Jacobi 
polynomials,  one  may  also  use  the  generalized  Laguerre  and  the  Hermite 
polynomials.  In  fact,  any  orthogonal  polynomial  family  that  has  interesting 
structural  features  can  be  the  basis  of  a  weighting  family  which  inherits  this 
structure.  In  a  different  direction,  certain  cases  for  D  >  1  may  yield 
interesting  designs  and  have  not  been  explored.  The  weights  corresponding  to 
all  these  cases  can  computed  exactly  by  the  FFT  method  presented  in  this  paper. 


20 


-100- 


TM  851015 


APPENDIX 


The  Fortran  program  JACWTS  listed  below  assumes  that  D  >  1  and  that  2nD 
is  a  power  of  2.  The  function  t q(z)  is  defined  exactly  as  in  (18),  and  the 
polynomial  Pn (z )  is  taken  to  be  the  Jacobi  polynomial  Pn(«*6)(z). 

This  program  is  an  implementation  of  the  (exact)  FFT  method  described  by 
Eq.  (12),  where  a±g  and  bn  are  given  by  (22)  and  (21),  respectively.  The 
user  of  JACWTS  need  only  specify  values  for  a,  s,  ag,  rg,  and  zg.  In 
JACWTS  these  variables  are  referred  to  by  the  labels  ALPHA,  BETA,  AO,  RO, 

ZO,  respectively.  The  arrays  X  and  Y  contain,  on  output,  the  real  and 
imaginary  parts  of  the  array  weights  fc^j  .  These  two  arrays  must  be 
dimensioned  at  least  2nD+l  in  the  routine  which  calls  JACWTS.  The  integers  n 
and  0  are  referred  to  by  the  labels  N  and  D,  respectively,  in  the  subroutine 
argument  list.  Also,  L06N  and  LOGO  are  defined  so  that  N  -  2**L0GN  and  D  - 
2**L0GD. 

This  program  assumes  that  a  subroutine  named  JACOBI  evaluates  the  Jacobi 
polynomial  (19)  for  arbitrary  complex  values  of  a,  8,  and  z.  This  subroutine 
can  be  based  on  the  published  codes  in  [2]  and  [3].  This  program  also  assumes 
that  subroutines  are  available  for  computing  a  complex  FFT  of  size  2nD;  the 
particular  ones  used  here  are  based  on  Markers  method  and  are  not  listed. 
Their  names  are  DPMCOS  and  OPMFFT.  These  routines  require  a  work  array,  C, 
dimensioned  at  least  2n0  in  the  routine  which  calls  JACWTS. 

JACWTS  is  written  in  double  precision  complex  mode  to  forestall  any 
numerical  round-off  error  problems  that  might  arise.  The  test  suggested  in 
Section  II  (that  follows  from  the  resolution  of  the  aliasing  effects  as  in 
(12))  is  incorporated.  It  is  the  only  test  used  to  ascertain  whether 
numerical  round-off  of  significant  proportions  occurred.  No  numerical 
difficulties  have  been  detected  by  this  test  to  date,  which  Indicates  that  the 
computation  is  usually  numerically  reliable. 


21 


-101- 


TM  851015 


SUBROUTINE  JACWTS(X,Y,C,N,LQGN,D,LOGO, ALPHA, BETA,  AO,  R0,Z0) 
COMPLEX* 16  Z, T, H, S,R,TSU«D, ALPHA, BET A, ZO, AO, RO, JACOBI 
INTEGER  ,N,L0GN,0,L0G0,TW0ND 
00U3LE  PRECISION  tPST , ARG, PI ,X C 1 ) , Y C 1) ,CC 1 ) 

DATA  PI, EP3I/3. 14 159265358979323800,0. 50-7/ 

MsT.OG'i  ♦LOGDf  1 

N0=‘1*0 

TW0M0s2*K0 

c  ...  all  WEIGHTS  except  the  first  and  the  last 

I=tl 

00  10  J*l , TwQftD 
ARGs-PI * ( MO-J+ 1 ) /NO 
Z«0CMPLX(C0S(ARG1 ,SIN(ARG)) 

TsTSU3O(D,A0,R0,Z0,Z) 

HaJACOBI  ( :J ,  ALPHA ,  3ETA ,  T) 

X(J)aI*0R2AL(ri) 

T( J)aI*OI  1AGCH) 

I=-I 

10  CONTINUE 

CALL  DPMCaSCC.TWOND) 

CALL  DPMFFT(X,Y,C,M,M) 

X  =  *l 

no  15  Jai ,T ,OND 
X  C  J  )  a  I  *  X  (  J  )  /  T  0 :  J  0 
Y(J}aI»Y(J)/TteONO 
la-I 

15  CONTINUE 

...  THE  FIRST  ANO  LAST  WEIGHTS 

H=.25D0*Z0*R0 

Sa.2507*Z0/R0 

Ral.ODO 

Tal.000 

00  18  Ial,N 

ZaCC  H  l-IM  J+ALPHA  +  3ETA)/I 

TaT*H*Z 

PaR*S«Z 

18  CONTINUE 

XUJaOREALCR) 

ni)aOi:UG(R) 

X(Ti»ONO+l)aOREAL(T) 

YITWONDTt  JaOIW AG( T) 

...  NUMERICAL  ACCURACY  TEST 
ARGs(AJS(X(t)«0*EAL(RVr>)*ABSam-OIMAG<»*T))) 

♦  /(la0D0+A3S(<(l)  )  ♦  A3S  (  Y(  1 )  )  ) 

IF( ARG. GT.EPSI) PRINT  50 

50  FORMAT! *  NUMERICAL  ROUND-GFF  ERROR  IS  SIGNIFICANT.') 

RETURN 

END 

FUNCTION  rS'JBO(D,AO,RO,ZO,Z) 

C0«PLEXA16  Z,Z'V>0,R0,TS!J8D 
INTEGER  0 

C  TREAT  THE  CASE  OMJ  IGNORE  OTHER  VALUES. 

TSU30s.500*ZO*(  ( 1 .OPO/(RO*Z) )  ♦  AO  ♦  (RO*Z)  ) 

RETURN 

END 


22 


-102- 


TM  851015 


REFERENCES 


1.  R.  L.  Strelt,  "A  Two-Parameter  Family  of  Weights  for  Nonrecursive  Digital 
Filters  and  Antennas,"  IEEE  Trans,  on  ASSP,  Vol.  ASSP-32,  February  1984, 

pp.  108-118. 

2.  B.  F.  W.  Witte,  "Algorithm  332,  Jacobi  Polynomials,"  Comm.  ACM,  Vol.  11, 
June  1968,  p.  436. 

3.  0.  Skovgaard,  "Remark  on  Algorithm  332,"  Comm.  ACM,  Vol.  18,  February  1975, 
pp.  116-117. 

4.  G.  Szego,  Orthogonal  Polynomials,  Fourth  Edition,  AMS  Colloquium 
Publications,  Vol.  Xxill,  American  Mathematical  Society,  Providence,  RI, 
1975. 


23 


-103- 


A  Discussion  Of  Taylor  Weighting 
For  Continuous  Apertures 

R.  L.  Streit 


-105- 


Abstract 


It  is  shown  that  Taylor's  beampattem  for  a  continuous 
aperture  can  be  computed  analytically  without  Fourier 
transforming  the  weighting  function  itself,  thereby  achieving 
economies  in  computational  effort  in  some  modeling  situations.  A 
short  Fortran  program  is  given.  An  approximate  formula  for  the 
half-power  beamwidth  is  derived.  It  is  pointed  out  that  the  Taylor 
weighting  function  can  be  negative  for  large  n,  a  fact  that  does  not 
seem  to  be  well  known.  In  addition,  modification  of  Taylor’s  design 
to  force  the  weighting  function  to  go  to  zero  as  a  power  a  of 
distance  from  the  aperture  endpoints  is  discussed.  For  a  =  1  and  a 
-  2,  this  results  in  an  increase  of  5  and  10  percent,  respectively,  in 
the  beamwidth. 


-107- 


TM  No.  851004 


I.  INTRODUCTION 

This  Memorandum  is  a  review  of  Taylor's  original  weighting  function  for 
continuous  apertures.  It  is  presented  in  some  detail  in  Sections  II  and  III. 
It  is  shown  that  Taylor's  beanpattern  and  weighting  function  can  be  computed 
easily  by  analytically  exact  formulas.  Taylor's  beampattem  turns  out  to  be 
the  product  of  a  rational  function  and  the  beanpattern  of  a  uniformly  weighted 
aperture. 

Also  reviewed  is  a  modification  due  to  Rhodes  of  Taylor's  pattern  for  the 
purpose  of  forcing  the  weighting  function  to  go  zero  as  a  power  a  of  distance 
from  the  aperture  endpoints.  This  results  in  a  5 1  increase  in  beamwidth  over 
the  beamwidth  of  Taylor's  original  pattern  if  a  -  1,  and  a  10%  increase  if 
a  «  2  (for  TT  ■  10;  see  below).  These  modifications  are  discussed  in  Section 
IV. 


Taylor's  original  paper  [1]  derives  a  symmetric  weighting  function  for  a 
continuous  aperture.  He  does  not  discuss  or  even  mention  its  use  for  arrays 
of  point  sensors.  His  method  is  essentially  an  ad  hoc,  but  intuitively 
sensible,  procedure  which  blends  together  the  desirable  characteristics  of 
uniform  weighting  and  the  van  der  Maas  weighting  into  one  weighting  design. 

The  blending  is  acconplished  by  careful  specification  of  the  beampattem 
nulls.  The  various  sidelobe  levels  do  not  enter  the  method's  derivation.  In 
other  words,  the  si  delobes  are  whatever  they  turn  out  to  be  after 
specification  of  the  nulls. 

It  is  often  said  that  Taylor  weighting  makes  the  first  few  sidelobes  near 
the  mainlobe  nearly  flat;  that  is,  all  "near-in"  sidelobes  have  essentially 
the  same  amplitude.  This  statement  is  erroneous.  See  Figure  1,  for  exanple, 
where  the  9  sidelobes  (n  -  10)  nearest  the  mainlobe  would  all  be  at  -20  dB  if 
the  statement  were  true.  Instead,  the  first  sidelobe  is  at  -20  dB  and  the 
ninth  sidelobe  is  at  (roughly)  -25  dB. 

It  is  a  useful  fact  that  the  beanpattern  corresponding  to  Taylor 
weighting  can  be  computed  analytically,  without  Fourier  transforming  the 
weighting  function.  This  can  be  seen  from  Taylor's  original  discussion  [1], 
which  is  reviewed  In  this  Memorandum.  Taylor's  original  notation  is  retained 
here.  Appendix  A  gives  a  FORTRAN  program  which  computes  the  beanpattern 
and/or  the  weighting  function  using  the  analytical  formulas  developed  below. 

In  addition,  it  computes  the  exact  half-power  beamwidth. 


The  aperture  Is  assumed  to  lie  on  the  p-lnterval  from  -w  to  +*.  The 
weighting  function  g(p)  Is  related  to  the  far-field  beanpattern  F(z)  by 


F(z) 


g(p)  e1zp  dp. 


(1) 


Taylor  assumes  throughout  that  g(p)  Is  a  real  even  function.  Consequently, 
F(z)  Is  also  an  even  function  of  z. 

It  Is  well  known  that 


5 


-109- 


TM  No.  851004 


2'§ 


cffl  F(m) 


sin  *(z-m) 

i  *(2-m) 


sin  »(z4m) 

— ?fz%T 


where  *o  ■  1  and  cffl  «  2  for  m  >  1.  In  other  words,  knowledge  of  the 
integer  samples  of  F(z)  Implied  knowledge  of  F(z)  everywhere.  A  very 
different  representation  of  F(z)  is  the  infinite  product 


F<Z)'7T 


where  {zi,  Z2,  ...}  is  a  complete  list  of  all  the  positive  zeros  of  F(z). 

It  is  an  interesting  mathematical  fact  that  these  zeros  must  all  lie  on  the 
real  z-axis.  For  example,  uniform  weighting  g(p)  »  1/(2*)  gives 

Ft*) 

whose  positive  nulls  are  (1,2,3,  From  (3),  then, 

oo 

(4) 

a  well  known  identity  dating  back  at  least  to  Euler’s  time  (circa  1750). 

By  means  of  his  choice  of  nulls  (zn)  in  the  representation  (3)  of  F(z), 
Taylor  sought  a  beampattem  which  had  a  flat  envelope  near  the  mainbeam  and, 
for  large  z,  an  asymptotic  6  dB/octave  decay  rate.  He  also  sought  by  this 
same  means  a  physically  realizable  aperture  to  approximate  the  physically 
unrealizable  ideal  van  der  Maas  function.  (It  is  unrealizable  because  of  the 
presence  of  delta  function  spikes  at  the  aperture  end-po1nts,p  -  *  ».)  Taylor 
found  a  set  of  nulls  which  came  close  to  attaining  his  first  objective  and 
which  did  attain  his  second  objective.  The  next  section  is  a  description  of 
Taylor's  nulls. 

II.  TAYLOR'S  NULL  SPECIFICATION 

Taylor  specifies  the  nulls  Zn  of  his  beanpattem,  starting  with  n  -  n, 
to  be  exactly  the  same  as  those  of  the  uniform  weighting  function;  that  is. 


zn  ■  n  for  n  «  n,  n  ♦  1,  ...  (5) 

The  positive  Integer  Ti  is  a  free  parameter  which  can  be  chosen  as  desired. 
Note  that  n  »  1  gives  exactly  uniform  shading.  Note  also  that  the  null  list 
(5)  guarantees  a  6  dB/octave  asymptotic  decay  rate  as  z-+  «  .  (This  follows 
from  (8)  below.) 


-Ill- 


TM  No.  851004 


To  complete  the  list  of  positive  nulls  for  his  beampattern,  Taylor 
selects  (when  n  >  1)  the  "near-in"  nulls  to  be 


S  ■  maximum  sidelobe  level  (In  dB). 

This  choice  for  the  first  n-1  nulls  may  seem  mysterious  at  first  glance,  but 
It  Is  a  choice  based  on  the  Ideal  van  der  Maas  function  [2],  defined  by 

FQ(z,  A)  -  cos  «V*2  “  A2,  A>0. 

It  is  an  interesting  mathematical  fact  that  among  all  beanpattern  functions 
F(z)  such  that 

(a)  F(z)  has  a  Fourier  transform  vanishing  outside  the  aperture  -*  to  +» 

(b)  |  F(z)|<  1  for  |z  |  >  A, 

the  one  with  the  maximum  possible  value  at  z-0  Is  the  van  der  Maas  function 
F(z)  -  Fo(z,A).  The  positive  nulls  of  F(j{z,A)  are 

zn  -  1^2  +  (n  -  \)2,  n  -  1,  2,  3,  ... 

Comparison  of  these  nulls  with  Taylor's  ad  hoc  null  specification  (fc)  shows 
that  Taylor's  nulls  are  related  to  the  van  der  Maas  nulls  by  a  dilation  factor 
o.  The  factor  o  is  chosen  to  be  slightly  larger  than  unity  to  compensate  for 
the  6  dB /octave  decay  of  the  beampattern  for  z  _>  n.  Note  that  n  ■  «  gives 
exactly  the  van  der  Maas  beampattern. 

III.  TAYLOR'S  BEAMPATTERN  AND  WEIGHTING  FUNCTION. 

Taylor's  beanpattem  can  now  be  expressed,  using  (3),  as 


8 


-112- 


TM  No.  851004 


The  last  expression  in  (7)  can  be  rewritten,  using  (4),  to  give 


(8) 


In  this  expression,  limits  must  be  taken  whenever  z  -  0,  1,  2,  3,  ....  n-1  to 
avoid  the  indeterminate  form  0/0.  See  Appendix  B.  Note  that  Taylor's 
beanpattern  is  identically  the  product  of  a  rational  function  (of  degree  n-1 
in  z<?)  and  the  beanpattem  of  the  uniformly  weighted  aperture,  sin(*z)/«z. 

It  is  clear  that  Taylor's  beanpattern  can  be  computed  analytically  from 
(8)  without  computing  the  weighting  function  at  all.  The  representation  (2) 
of  F(z)  is 


F(z) 


2w 


F  (m) 


(sin  i  (z-ffl)  +  sin  «(z+m) 
«(z-m)  «( ztm) 


(9) 


since  F(n)  -  0  for  n  >  n.  This  is  not  as  efficient  as  using  (8).  However,  it 
does  yield  an  efficient  way  to  compute  the  weighting  function  g(p).  By 
Fourier  transforming  it  term  by  term  and  using  the  fact  that  F(m)  «  F(-m),we 
get 


g(p) 


n-1 

F (0 )  ♦  2^^F(m)  cos  mp 
n-1 


.ipi<  * 


(10) 


This  is  the  (spatial)  Fourier  series  of  Taylor's  weighting  function.  By 
conputlng  once  and  for  all  the  constants  F(0),  F(l),  ...,  F(n-l)  using  (8), 
the  series  (9)  can  be  an  efficient  formula  for  computation. 

The  beamwidth  measured  between  the  first  nulls  is  (from  (6)  with  n-1) 

“"hull  ■ 

where  a  and  A  are  given  as  above.  An  exact  formula  for  the  half-power 
beamwidth  Is  not  available.  Table  2  gives  half-power  beamwldths  that  were 
computed  numerically  (using  a  general  purpose  subroutine  in  [3,  Chapter  7]  ). 
More  useful  perhaps  is  the  following  approximate  formula  for  the  half-power 
beamwidth 


9 


-113- 


TM  No.  851004 


To  prove  (11),  note  that  the  asymptotic  expansion 


(11) 


F(z)  -  1  - 


n-1 


E?T7"M„q?) 

n-1  n-n 


z  ,  z-»  0 


follows  immediately  from  (7).  Since 


5^7  “^7_]C7“ 

n-Tt  n-1  n-1 


n-1 


we  have 


F(z) 


tfv  i#  *  ("-?) ) 


-  •=£  j  z*  ,  Z-*  0  . 


Setting  F(z)  -  1/2  and  solving  for  z  gives  (11). 

The  accuracy  of  (11)  Is  good  In  two  limiting  cases.  As  o-+  1 

and  (11)  becomes 


BW 


3dB 


1  pE 


-1/2 


t?  ft2  +  (n-l) 

The  exact  answer  for  the  van  der  Maas  function  Is 

1 1/2 


(12) 


BM3dB  “  2  A 


-  arc  cosh  (^  cosh  *A)^2 


and  a  comparison  with  (12)  is  given  in  the  last  row  in  Table  3.  Similarly, 
for  IS  *  1,  the  sum  in  (11)  vanishes  and 


BW 


3d8 


:VT 


1.103  radians 


10 


-114- 


TM  No.  851004 


which  is  within  10  percent  of  the  correct  answer  of  =  1.207  radians 

for  the  uniformly  weighted  aperture. 

Table  3  gives  the  relative  error  between  the  approximation  (11)  and  the 
exact  half-power  beamwidth  for  the  same  entries  as  in  Table  2.  It  may  be 
concluded  from  Table  3  that 

(a)  the  approximation  (11)  is  always  on  the  low  side  of  the  exact 
half-power  beamwidth,  and 

(b)  the  correction  required  to  make  (11)  exact  is  a  constant  factor 
which  depends  strongly  on  the  specified  sidelobe  level  and  very 
weakly  on  "n. 

Consequently,  a  suitable  correction  factor  depending  only  on  specified 
sidelobe  level  would  make  (11)  very  accurate. 

The  Taylor  weighting  function  need  not  always  be  a  positive  function. 
The  best  way  to  show  this  is  by  example.  Consider  the  case  n  -  100  and  a 
sidelobe  level  of  S  -  -20  d8.  The  weighting  function  is  slightly  negative 
just  inside  the  aperture  endpoints  (for  p  «*3. 078761,  for  example,  Taylor's 
weight  is  -.005519929).  See  Figure  2.  The  Taylor  function  in  practice  is 
nearly  always  positive  for  smaller  values  of  n. 


n 

-lOdB  (A  -  .578) 

-20dB  (A-  .953) 

-30dB  (A.  1.32) 

-40dB  (A-1.69) 

5 

1.0475 

1.3264 

1.5526 

1.7323 

10 

1.0009 

1.2818 

1.5220 

1.7262 

15 

.9851 

1.2641 

1.5051 

1.7126 

20 

.9771 

1.2548 

1.4954 

1.7036 

25 

.9724 

1.2491 

1.4892 

1.6975 

30 

.9692 

1.2452 

1.4849 

1.6932 

100 

.9581 

1.2313 

1.4691 

1.6761 

00 

.9533 

1.2252 

1.4619 

1.6680 

Table 

2.  Exact  Taylor 

half-power  beamwidths 

IT 

-lOdB  (A-.578 

-20dB  (A,953) 

-30dB  (A-1.32) 

-40dB  (A-1.69) 

5 

7.67  % 

9.98  % 

11.4% 

12.2% 

10 

7.57 % 

9.91  % 

11.3% 

12.2% 

15 

7.55 % 

9.90  % 

11.3% 

12.2% 

20 

7.54% 

9.89  % 

11.3% 

12.2% 

25 

7.55% 

9.89  % 

11.3% 

12.2% 

30 

7.54% 

9.89  % 

11.3% 

12.2% 

100 

7.55% 

9.88  % 

11.3% 

12.1% 

00 

7.54% 

9.88  % 

11.3% 

12.1% 

Table  3.  Relative  error  of  approximation  (11)  to 
Taylor  half-power  beamwidths. 


11 


-115- 


OO'O 


TM  No.  851004 


IV.  MODIFICATIONS  OF  TAYLOR  WEIGHTING 

Rhodes  [4,5]  shows  that  the  Taylor  weighting  function  g(p)  can  be  made  to 
go  to  zero  as  any  power  a  >  -1  of  distance  from  the  aperture  endpoints  by 
altering  the  position  of  the  nulls  in  Taylor's  function  F(z).  The  general 
design  technique  can  be  viewed  as  an  extension  of  certain  ideas  in  Taylor's 
original  paper  [1],  using  mathematical  methods  developed  by  Rhodes.  The  most 
Important  cases  are 

1.  a  -  0,  which  is  exactly  Taylor's  original  case;  F(z)  decays 

asymptotically  at  6  dB  per  octave. 

2.  a  «  1,  for  which  the  weighting  function  goes  to  zero  linearly  at  the 

aperture  endpoints;  F(z)  decays  asymptotically  at  12  dB  per  octave. 

3.  a  -  2,  for  which  the  weighting  function  goes  to  zero  quadratically  at 

the  aperture  endpoints;  F(z)  decays  asymptotically  at  18  dB  per  octave. 

The  cases  a  -  1  and  a  -  2  are  given  explicitly  below,  after  giving  the  method 
for  any  a  >  -1. 

A  theoretically  significant  criticism  of  Rhodes'  work  is  that  he  does  not 
prove  that  his  technique  is  mathematically  correct.  The  available  theory  (due 
to  Paley  and  Wiener,  and  to  Levinson)  provides  a  proof  only  for  -1/2  <  a  <  1/2 
and  a-1.  As  Rhodes  states  [5],  "it  is  not  unreasonable  to  expect  that  the 
general  theory"  is  valid  for  all  a  >  -1.  In  any  event,  we  can  proceed  to 
develop  the  method  for  all  a  >  -1  in  a  purely  formal  way,  ignoring  a 
theoretical  question  which  may  in  the  end  not  be  of  any  practical  importance. 
Taylor's  original  method  is,  after  all,  an  ad  hoc  technique  and  so  is  Rhodes' 
generalization  of  it. 

Rhodes'  development  retains  the  integer  as  the  breakpoint  between  the 
near-in  nulls,  which  are  dilated  versions  of  van  der  Maas'  nulls,  and  the 
outer  nulls,  which  force  the  asymptotic  decay  rate  for  F(z)  to  be  6(1+  «)  dB 
per  octave.  Consequently,  in  the  limit  as  the  van  der  Maas  function  is 

again  obtained  for  all  a  >  -1,  just  as  In  Taylor's  original  design  a  -  0. 

This  means  that  the  desired  behavior  of  the  weighting  function  at  the  aperture 
endpoints  Is  confined  to  small  neighborhoods  of  the  aperture  endpoints  for 
larger  n.  In  other  words,  the  weighting  function  changes  rapidly  just  inside 
the  aperture  endpoints  for  large  n. 

The  development  in  [4]  is  brief  and  only  the  case  a  -  1  Is  given  in  any 
detail.  His  later  paper  [5]  gives  enough  detail  to  carry  out  the  general 
development  for  a  >  -1.  This  requires  the  Identity,  valid  for  a  >  -1, 


T 

a 


U) 


(1  ♦  ./ 2) 


Z  *  1  -  a/2) 
Z  ♦  1  ♦  a/2 " 


sin  «(z  -  a/2) 
"*(z  -  a/2) 


(13) 


It  is  proved  as  follows.  A  special  case  (zi  -  z2  «  a/2  and  Z3  -  z)  of  a 
result  in  [6,  Equ.  1.3(4)]  gives 


13 


-117- 


TM  No.  851004 


Dividing  by  the  first  term  in  the  infinite  product,  and  then  using  the 
recurrence  formula  [6,  Equ.  6.1.15]  and  the  reflection  formula  [6,  Equ. 
6.1.17]  whenever  necessary,  gives 


r2(f  ♦  1)  p(z  -  f )  x 

ru + 1  ♦  z)  r(z  -  f)  ra  -  (z  -  f)) 


r2(f  +  1)  r(2  -  f)  Sin  *(z  -  f) 

■  ■— . - -  (14) 

riz  +  f  +  D 

Multiplying  and  dividing  by  2  -  (a/2)  on  the  right  hand  side  of  the  last 
equation  yields  (13).  (We  note  that  above  Is  given  without  proof  by  Taylor 
[1,  Equ.  (29)].) 

Rhodes  defines  the  general  Taylor  pattern,  Fa(z),  for  a  >  -1  to  be 


and  A  is  the  same  as  given  above  (just  after  (6)).  Note  that  for  a  -  0  the 
function  Fq(z)  is  exactly  Taylor's  original  function  F(z).  The  analog  of 
(8)  for  general  a  is 


14 


-118- 


TM  No.  851004 


as  is  clear  from  (15)  and  (13).  As  z  — *■»  ,  the  rational  function  of  degree 
n-1  in  z2  in  (17)  approaches  a  constant  and  Ta(z)  is  asymptotic  to  a 
constant  (depending  only  on  a)  times  l/|z|l+<*.  The  asymptotic  decay  rate  of 
Fa(z)  is  therefore  6(1  +  a)  dB  per  octave.  In  addition,  the  asymptotic 
decay  rate  means  that  Fa(z)  has  a  Fourier  transform  vanishing  outside  the 
aperture  [-«,  *]  for  every  »  >  -1  and  n  >  1. 

For  a  >  -1  define  the  "sanpllng  functions" 


(z) 


r(z  -■§  ♦  1)  sin  «(z  -  |) 
n  r(z+$)  .(z2-(n*f)Z) 


(18) 


where 


(— l)n(2n  ♦  «)(n  +  a) /n!  ,  if  a  i  0 
Cn(a)  ■  1  ,  If  a  *•  0,  n  -  0 

| (-l)n2  ,  if  •- 0,  n  -  1,2,3,... 

Each  g(“)(z)  is  an  even  function  of  z.  These  functions  are  essentially  the 

Lagrange  Interpolating  functions  for  the  points  Mn  +  s/2) ;  n  -  0,1,2,...}. 
More  precisely,  the  only  nulls  of  G„  (z)  are  of  the  form  *(n  ♦  a/2)  and, 
furthermore. 


(*("  *!»  -ft:  If 


m  -  n 
m  4  n. 


(19) 


The  functions  g(*)(z)  are  derived  using  methods  due  originally  to  Paley 
and  Wiener. 


The  open  theoretical  question  mentioned  earlier  in  this  section  concerns 
the  completeness  of  the  sampling  functions  (18)  with  respect  to  all, even 
aperture-1 imited  functions.  As  stated  already.  It  is  known  that  Gi°'  are 

cooplete  for  -1/2  <  a  <  1/2  and  for  a  -  1.  For. other  values  of  a  >  -1,  nothing 
is  known.  Proceeding  on  the  assumption  that  G 1®'  are  complete  for  all 

a  >  -1,  It  follows  Fa(z)  can  be  expanded  in  the  form 


F.<‘> 


-z 


«5*’w 


15 


-119- 


TM  No.  851004 


for  some  constants  an.  From  (19)  it  follows  that  an  -  Fa(n  +  (a/2))  for 
all  n.  From  (15)  it  follows  that  an  -  0  for  n  >ii.  Therefore, 


n  -  1 

F„<z>  -2  F.("*?>  Gi“>(2)  •  (20) 

n  -  0 

which  generalizes  (9). 

Denote  by  gtt(p)  the  weighting  function  corresponding  to  Fa(z).  As 
just  stated,  ga(p)  vanishes  outside  the  aperture  [-»,  *].  Question:  Does 
9a(p)  9°  t0  zero  as  the  power  a  >  -1  of  distance  from  the  aperture 
endpoints  ?  Taylor  [1]  proves  that  any  even  function  with  this  endpoint 
behavior  has  a  Fourier  transform  whose  nulls  are  asymptotic  to  ±(n  +  (o/2)), 
but  he  does  NOT  prove  the  converse.  Consequently,  although  ga(p)  is  even 
and  has  a  Fourier  transform  with  the  proper  null  locations,  this  is  not 
necessarily  sufficient  to  answer  the  question  in  the  affirmative.  However, 
taking  the  term  by  term  Fourier  transform  of  (20)  gives  the  expansion 

IT  -  1 

«(«)(p)  ■  2  F.("*?)  (21) 

n  «  0 

where  for  n  ■  0,  1,2,  ... 

n 

H«(P)  .  (2  cos  $)•£  T  TH%c«<rp>*|P|  ‘  ’  <22> 

r  »  0 

0  ,|P|>* 

where  eo  -  1  and  cr  ■  2  for  r  >  0,  and 
1  for  all  a.  If  k  -  0 

(a)k  * 

(a  ♦  1)  ...  (a  ♦  k  -  1)  for  all  a  ,  If  k  >  0. 

Since  each  of  the  functions  Hjtt)(p)  has  the  correct  endpoint  behavior, 

9a(p)  must  also  have  this  same  behavior. 


Just  as  In  Taylor's  original  case,  both  the  aperture  function  ga(p) 
and  the  beampattern  function  Fa(z)  can  be  computed  independently  of  each 
other  using  the  analytically  exact  formulas  (21)  and  (17),  respectively,  for 
any  a  >  -1.  Appropriate  approximations  near  the  points  *(n  ♦  a/2)  analogous 
to  those  developed  in  Appendix  B  for  a  ■  0,  are  necessary  for  computing  Ptt(z) 
using  (17).  Developing  these  approximations  should  not  present  any 
mathematical  difficulties. 

The  three  cases  a  ■  0,  1,  2  are  now  given  explicitly.  Fortran  programs 
implementing  these  three  cases  should  be  easy  to  write.  The  sampling 
functions  are 


16 


-120- 


TM  No.  851004 


6^(2)  -  (-1) 


n  cn  2  z  sin  »z 

r  77 TTTj- 


"  .  <z2  -  (n*  -i)Z) 


G(2,(z)  -  (-1)"*1  2<"*S  Sl"  ”  ,  ■  . 

"  .  z  (z2  -  (n*l)z) 

and  the  corresponding  aperture  basis  functions  are 


Hn°^p)  *  TT  cos  np 

H^(p)  .  cos  (n+^)p 

«i2)(p)  -  (-l)n  ♦  cos  (n+l)p 


It  should  be  noted  that  (23)  and  (26)  are,  within  a  scale  factor,  identical  to 
(9)  and  (10),  respectively.  In  ail  cases,  the  aperture  function  ga(p)  is 
confuted  from  (21).  Consequently,  the  only  potential  difficulty  is  confuting 
the  constants  Fa(n  +  a/2)  for  is  n  •  0,  1,  ....  n  -1.  Fortunately,  for 
a  -  0,  1,  and  2,  these  constants  are  easy  to  conpute  using  (17)  since  the 
following  identities  hold: 


yz) 

Tj(z) 


yz)  - 


sin  *z 

wz 

COS  Ml 


1  -  4z 
sin  wz 


*z  (l-z‘) 


The  price  paid  for  the  desired  end  effects  is  an  Increase  In  the 
beamwldth  over  the  beamwldth  of  Taylor's  original  weighting  function.  The 
beamwidth  measured  to  the  first  null  is,  for  all  a  >  -1, 


-,V^ 


where  oa  is  given  by  (16)  above.  This  gives  exactly,  for  fixed  n, 

OWuiji  I  (for  any  a)  « 

BWNU-LLTfor.-.-TJT-^-1+^- 


This  means  that  for  n  -  10,  the  beamwldth  measured  between  nulls  Is  5%  larger 
for  a  -  1  and  10*  larger  for  a  •  2  than  for  Taylor's  original  a  -  0 
beampattern.  It  Is  anticipated  that  approximately  the  same  percentage 
Increases  occur  In  the  half-power  beamwldths. 


-121- 


TM  No.  851004 


A  different  modification  to  Taylor's  nulls  can  be  utilized  to  produce 
asymmetric  beampatterns  using  complex  valued  aperture  functions  g(p).  This  is 
described  in  [8]  for  an  application  in  radar  to  minimize  ground  clutter.  The 
magnitude  of  the  aperture  function  turns  out  to  be  an  even  function,  while  the 
phase  of  the  aperture  function  turns  out  to  be  odd. 

V.  CONCLUSIONS 

Taylor  weighting  can  be  modified  to  force  the  weighting  function  to  go 
zero  as  any  power  a  >  -1  of  distance  from  the  aperture  endpoints.  Taylor's 
original  weighting  (a  -  0)  results  in  a  pedestal,  while  for  a  -  1  the 
weighting  function  goes  to  zero  linearly  as  in  a  cosine  window^and  for  a  »  2 
the  weighting  function  behaves  like  a  cosine-squared  window  at  the  aperture 
endpoints.  The  endpoint  effect  is  achieved  for  a  modest  increase  in  the 
mainlobe  beamwidth. 


18 


-122- 


Appendix  A 


TM  No.  851004 


c 

c 

c 

c 

c 

e 

c 

c 

c 

c 

c 

c 

c 

e 

c 

e 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


♦ 


10 

IS 


20 

25 


30 

35 


40 


CG'-'Fir;?;  TAYLOR'S  Cni:Ti*:ur.bs  SnADlnG  ptiHCT  10m  A ‘ID  TPm'ISFEP  FL»»CTIOn 
t r. p u t  require  i  .it.  t 

i  IHE  SPATIAL  A9£PTURE  ties  FRJ”  -PI  TC  +PI 
Af<CU.»*cf,T  £>Er  TUITIONS: 

x  =  abscissas  for  sauRilna  tne  snajim  function 
s  *  snaking  function  values 

rs  =  nuriner  of  s  samples:  none  computed  if  ns  =  o 
k  =  abscissas  tor  samnlino  tne  transfer  function 
f  =  transfer  function  values 

nk  =  number  of  f  samples:  none  commuted  If  nk  =  0 
nbar=  tne  first  noar-1  zeros  of  f  are  tnose  of  van  der  Maas 
do  =  sidelobe  level  In  jb  of  llmttino  r-ol ph-rneovshev  array 
fir.  =  coefficients  of  tne  snarling  function  cosine  series 
n.vJab  =  -3  dfi  beamwidth;  not  computed  if  Dwjdb  is  set  to  -1. 
ninths  Tut’  LIMITS! 

x  and  s  must  oe  dimensioned  at  least  ir.ax(ns,*i) 
k  and  f  must  De  dimensioned  at  least  rr.ax CnX # *1 ) 
f t  .t, i j s r  be  dimensioned  at  least  noar 
TECh-UrAL  »'UTtS: 

J'cARsl  GIVES  fPL  uMIFGPi>-  3wAOING  FUNCTION 
•Mr«AK*I.4FIWlTY  GIVES  PCLFW-CHEdYSMt V  SoAOlt.G 
FIP3T  e.EAMPATTERM  MUt.L  =  3TGMA  *  sort  (  A  **  2* .  75 ) 

THE  COSINE  SERIES  FOR  S  HAS  DEGREE  EXACT!.,  flBAR-1 
PknGPA*'VER:  P.  «.  STPtir.  “USC,  DtC EMBER  21,  1PR4. 
r.ASI  REVISION  JAMJAR*  11,  1985 

subroutine  tay lor Cx, s ,ns , k , t ,nk , nbar ,db , f n> , o*3do) 
double  precision  x C 1 ) , s ( 1 ) , k( 1 ) , f ( U ,ob, f m(l) , xpt sloma , a, ni , q , 
zeroln,ao,bp,tol,ow3do 
data  pl/3. 1415926b 3 5fl979d0/ 
a*10.0ri0**aos(db/20.0ri0j 
a=diog(a+sartla*a-j .<><j9)  )/r>i 

s i-ptnasnber /stir t(a*a+ (nbar-.  5dO)» (nbar-. 5d0 ) ) 

nbar 1 =noar-l 

fR!Cl)si.o<jO 

lKnnar.eq.Dgo  to  15 

do  19  1*5, noar 

x  pf  =  1 - 1 

fi.i(il=ri(xDt,a,nbarl , sigma) 
continue 

if (ns.le.O)go  to  25 
do  29  1=1, ns 

S(l)=U.0d0 

If (aos(x(ll ) .gt, (pi+l.d-7) )go  to  20 

s(t)*3(x(i) ,  fm.nbar) 

continue 

lt(nk.le.0)go  to  35 
do  30  lsl,nx 

f  (D=q(k(l),a,noarJ  , slama) 
continue 

if (bw3db.lt. 0.9d9)go  to  40 
ap=0 .OdO 

bp=slgma*s irt ( a*at, 25dO) 

tolal . ng-i 3 

nr3dn=2.9d0*zeroin(ap,bp,q, tol.a.nbarl .sigma) 

continue 

return 

end 

douole  precision  function  n(z,a,nbarl  ,.sioma) 
double  creelsion  pl,oiz,zn,z.a,slgma 
data  pi/3.ial5P24533fl979d0/ 


-123- 


TMNo.  851004 


to  lc=o,no^rl 

li(aps£z-t0.1t.l.0d-4)do  to  *0 
10  continue 

a= 1 .oao 

1 1  ( ntt«r  l  .ea.O  joo  to  21 
Ho  20  o=i , noart 

zn  =  iz/slcima)**2/(a*a*£n-.5d9)**2) 
qs-?*(  l.'jdo-zn)/(l,0d0-(z/n)**2> 

20  continue 

21  Dl7=0t*Z 
d=(sln(piz)/oiz)*g 
return 

3°  lt(r..gt.O)<io  to  50 

o=l . OaO 

1 r f nbarl .eo.O)ao  to  41 
Ho  40  n=i,noarl 

zn=iz/sioma)**?/£a*a+£n-,5dO)**27 
a=a* ( 1 .udo-zn ) /( 1 ,OaP-(z/n) **2) 

40  continue 

41  nir=Dl*z 

a=(  1 .0d0-piz*plz*£l,0dn-riz*piz/20,d0)/6.0d0)*d 
return 

50  a=1.0a0 

Ho  of)  n=l,nt>arl 
1  £  t  r.. eo .tr  )oo  to  60 
zr.=  £z/siama)**2/(a*a+£n-.  5dG1**21 
o=q* ( 1.0d0-znj/£1.0a0-(z/n)**2) 

60  continue 

zna(z/sif7«ia)**2/(a*a*(tf-.5r»C)**2) 

“3=0*  (  l  .  OHO-zn  ] 
clz=pi*lz-* } 

o=(i .OdO-plz*plZ*£ l .OdP-riz*olz/20.a0)/6.0H0)*n 

o  =  a*(-n**()cr  1  )*(</(z  +  z*z/tO  ) 

return 

end 

double  precision  function  g(p, fm,nbar) 
double  precision  p, twool, £m( 1 ) 
data  twODl/6. 293 1 05307 l795R6477d0/ 
o=0. OdO 

if fnnar.eq.l)go  to  20 
do  in  1=2, near 
o=c*f  n ( i  1  *cos  £  £  i-n  *pi 
10  continue 

20  o=f £m£ ! )*g*g)/t*opi 

return 
end 
c 

c  Compute  a  zero  of  a  real  function  f  in  toe  interval  t«x,  dxj. 

c  nouDle  precision  version  of  program  on  po.  164-166  of  "computer 

c  *etnods  for  *atnematlcal  Computations,"  by  G.L.  Torsvthe, 

c  7 , a .  Malcolm,  and  C.8.  Moler,  Prentlce-dall ,  1077,  out  sllgntiy 

c  altered  for  use  in  computing  Taylor's  *iaif-po«er  oeamwidtn. 

double  precision  function  zerolnfax ,bx, f , toi ,adum,n&arl ,siqma) 
double  precision  ax,ox,f ,toi,adum, sigma 
double  precision  a,o,c,d ,e,ens , fa , f o, fc, tol l , xm, p, o, r , s 
eps=t .non 

10  eps=eps/2.0d0 

toll*i .OdO+ens 
if itoll .ot . 1 .Odu)go  to  in 
a=ax 
b=nx 

20 


-124- 


TM  No.  851004 


fa  =  £(a,adu,n,nbarl/si0!na)-,5d0 
fc=r (o.ddun.noart  ,  sianaJ-.SuO 
2°  c=a 

*c=f  a 
1  =  o-a 
9  =  1 

30  1 1 (afcs C £e) .ge .aos(f ol )oo  to  40 

a=b 
b=c 
c=a 
£a  =  tb 
f  o=tc 
fc=fa 

40  toli=2.0dv*eps*abs(n)e.5d0*toi 

x.n=.5d0*(c-o) 

if  (abs(xix)  .le.tolDgo  to  90 
If r tb.eg.0.0d0)go  to  90 
if (absCe).lt.tollJoo  to  70 
if (abs ( fa) .le.absf f o) loo  to  7o 
if(a.ne.c)ao  to  SO 
s=fD/fa 
D=2.0dC*xm*s 
o=l .OaO-s 
no  to  60 
50  a=fa/ic 

r=fo/fc 
s=f b/f a 

Dss*(2.0d0*xm*a*(g-r)-Co-a)«(r-l.Q'i0) ) 
q=(q-1.0d0)*(r-1.0d0)*(S-l.0dO) 

60  1 f (p.gt.O.OdOjqs-q 

p=aos(Dj 

if(C2.0d0*p).ge.(J.0d0*x**a-abs(toll*q)ngo  to  70 

i£fP.ge.aDsC0.5dOae*q))go  to  70 
e=d 
dso/q 
no  to  90 

70  dsy.T 

esd 

90  a=t 

f  a=£  b 

if (aosC d) . at. toll Joab+d 
if (dbs(d) .le.toil)h=b+8iqnCtoll , xml 
fo=t(b,adum,nbarl  , slom»)-.5d0 
ti ( (£b*( fc/absffc) ) ) .qt.o.OdOjoo  to  20 
no  to  30 

90  zeroln*o 

return 
end 


21 


-125- 


TM  No.  851004 


Appendix  B.  Calculation  of  F(z) 
For  some  small  number  e  >  0,  say  e  *  10-4,  define 


s(z,  k)  - 


Isin  *z 


|  z-k | >c>0 


1  .  r  2  ,4  .  T 

(-1)  |l  -  •5“  (z-k)  +  -Jjjj  (z-k)  -  ...  ,  | z— k|  <e. 


Now,  if  |z-k|  >  c  for  k«  0,1,2,...,  n-1,  compute  F(z)  exactly  as  in  (8). 
|z-kU  c  for  k-0,  then  compute 


Use  of  these  formulae  eliminates  all  indeterminate  0/0  forms  that  arise  during 
actual  computation  using  (8). 


TM  No.  851004 


REFERENCES 


1.  T.  T.  Taylor,  “Design  of  Line-Source  Antennas  for  Narrow  Beamwidth  and  Low 
Side  Lobes",  IRE  Trans,  on  Ant.  and  Prop.,  vol.  AP-3,  pp.  16-28,  Jan  1955. 

2.  V.  Bare i Ion  and  G.  Temes,  “Optimum  Impulse  Response  and  the  van  der  Maas 
Function,"  IEEE  Trans,  on  Circuit  Theory,  vol.  CT-19,  July  1972,  pp. 
336-342. 

3.  G.  E.  Forsythe,  M.A.  Malcolm,  and  C.  B.  Moler,  Computer  Methods  for 
Mathematical  Computations,  Prentice-Hall,  1977. 

4.  D.  R.  Rhodes,  "On  the  Taylor  Distribution,"  IEEE  Trans,  on  Ant.  and  Prop., 
Vol  AP-20,  March  1972,  pp.  143-145. 

5.  D.  R.  Rhodes,  "A  General  Theory  of  Saddling  Synthesis,"  IEEE  Trans,  on 
Ant.  and  Prop.,  Vol  AP-21,  March  1973,  pp.  176-181. 

6.  A.  Erdelyi,  Editor,  Higher  Transcental  Functions,  Vol.  I,  McGraw-Hill, 
1953. 

7.  M.  Abramowitz  and  I.  A.  Stegun,  Handbook  of  Mathematical  Functions,  NBS 
Applied  Mathematics  Series,  Vol  55,  llS  DepT  of  Commerce,  1972. 

8.  R.  S.  Elliott,  "Design  of  Line  Source  Antennas  for  Narrow  Beamwidth  and 
Asymmetric  Low  Sidelobes,"  IEEE  Trans,  on  Ant.  and  Prop.,  vol.  AP-23,  Jan. 
1975,  pp.  100-107. 


23 


-127- 


Sufficient  Conditions  For  The  Existence 
Of  Optimum  Beam  Patterns  For 
Unequally  Spaced  Linear  Arrays 
With  An  Example 


R.  L.  Streit 


-129- 


Sufficient  Conditions  for  the  Existence  of  Optimum  Beam 
Patterns  for  Unequally  Spaced  Linear  Arrays  with  an 
Example 

ROY  STRE1T 

Abstract — Dolph’s  method  for  determining  the  optimum  element 
currents  for  half-wavelength  equispaced  discrete  linear  arrays  is 
generalized  to  symmetric  discrete  linear  arrays.  The  theorem  proved 
gives  sufficient  conditions  for  the  existence  of  optimum  beam  pat¬ 
terns  foi  arrays  with  elements  symmetrically  positioned  about  the 
array  center,  but  with  fixed  unequal  spacings  between  the  elements. 
The  conditions  are  such  that  the  Remes  exchange  algorithm  for 
minimal  approximation  of  functions  can  be  employed  to  compute 
the  optimum  element  currents  corresponding  to  an  optimum  beam 
pattern  directly  from  the  given  spacings  of  the  elements.  Half-wave- 
length  spaced  linear  arrays  satisfy  the  conditions  of  the  theorem; 
therefore,  it  provides  a  new  method  of  calculating  the  well-known 
Dolph-Chebyshev  element  currents.  An  example  with  unequal 
spacings  is  included  to  show  the  utility  of  the  method  even  when  the 
hypotheses  of  the  theorem  may  not  be  met. 


I.  INTRODUCTION 

Optimum  beam  patterns  and  element  currents  for  single  frequency 
linear  arrays  with  a  finite  number  of  omnidirectional  half-wavelength 
.-.paced  elements  were  determined  by  Dolph  [1  ]  through  a  technique 
involving  the  Ohebyshev  polynomials.  All  these  beam  patterns  have 
equal  amplitude  sidelobes.  Sufficient  conditions  are  given  here  for 
symmetric  linear  arrays  to  possess  optimum  beam  patterns  with 
equal  amplitude  sidelobes.  This  feature  is  precisely  the  fact  needed 
in  the  calculation  of  the  element  currents. 

The  definition  of  an  optimum  beam  pattern  used  in  Dolph’s 
paper  will  lie  used:  a  beam  pattern  is  optimum  if,  for  a  given  main 
Inlie  I  team  width,  the  overall  sidelobe  amplitude  is  minimized.  Beam- 
width  is  measured  from  the  maximum  response  axis  to  the  first  null. 
The  linear  arrays  considered  in  this  paper  are  those  whose  elements 
are  symmetrically  spaced  and  have  symmetrically  tapered  element 
currents  about  the  center  of  the  array. 


\1  .hi iuit) pi  rereiveil  January  in.  1074 :  revised  August 
Til.:  author  is  with  the  Naval  Underwater  Systems 
London  laboratory.  New  London.  Conn.  00320. 


17,  1074 
Center.  New 


1KKK  TRANSACTIONS  ON  ANTENNAS  AND  PROPAGATION,  JANUARY  197.') 

II.  PRELIMINARIES 

Let  fix)  be  a  real  valued  continuous  function  defined  on  the 
closed  interval  [a,  6],  The  norm  of/(x),  denoted  ||/||(.ai>  isdefined  to  be 

11/llla.u  =*  max  | fix)  |. 

•&S* 

Now  let  fii(x),. •  .,A,v(x)  be  a  given  finite  collection  of  real  valued 
continuous  functions  defined  on  the  closed  interval  [a, 6],  The  linear 
span  of  these  basis  functions  is  a  proper  closed  subspace  of  the  space 
of  all  continuous  functions  on  the  interval  [a, 6]  equipped  with  this 
max  norm.  It  is  known  that  there  exist  real  constants  ai,  •••,<.* 
such  that 


!l/(x)  -  Z«,Mx) 

is  a  minimum.  The  function  h(x)  =  21  i-i  a,A,(x)  is  defined  to  be 
a  minimiax  approximation  to  the  function  f(x)  from  the  basis 
M*),"*,A.v(x).  The  crucial  property  that  these  basis  functions 
must  satisfy  to  guarantee  the  uniqueness  of  a  minimax  approxima¬ 
tion  is  embodied  in  the  definition:  the  functions  M*)i-”A,v(x) 
form  a  Chebyshev  basis  of  degree  X  on  the  closed  interval  [a, 6],  if 
and  only  if  every  nontrivial  linear  combination  of  these  functions 
possesses  at  most  X  —  1  real  roots  in  the  interval  [a, 6].  A  particularly 
well-known  Chebyshev  basis  is  the  collection  l,x,*  •  •,xw'*  on  any 
finite  or  infinite  interval.  It  is  possible  that  a  given  collection  of 
functions  may  be  a  Chebyshev  basis  on  one  interval  but  not  on 
another.  I*,  can  be  shown  that  the  functions  h, (z), ■  •  -  Avfx)  form 
a  Chebyshev  basis  on  the  interval  [a,b],  if  and  only  if  the  deter¬ 
minants 


Mxi) 

Ml.) 

•••  M*.v) 

{/(!„•  •■ 

,xy)  = 

Mxi) 

M*») 

•••  Mx.v) 

hn(zi) 

hs(xr) 

•••  A.v(x.v) 

for  all  points  x„ 

such  that 

a  <  x,  <  x,  <  • 

•  •  <  Xv-i 

<  x.v  <  b. 

The  reader  is  referred  to  Karlin  and  Studden  [2]  for  a  proof  of  this 
and  other  equivalent  formulations  of  a  Chebyshev  basis,  as  well  as 
for  a  proof  of  the  following  fundamental  theorem. 

Theorem 

Let  h\{z) ,•  •  •  ,hn (x)  be  a  Chebyshev  basis  on  the  interval  [«,6]. 
Then  h(x)  -  for  some  real  constants  a„  is  a  mini¬ 

max  approximation  to  fix)  on  [a, 6],  if  and  only  if  there  exist  at 
least  X  1  points  a  <  x,  <  x,  <  •  •  •  <  xu  <  b,  M  >  X,  such 
that 

f(xi)  —  hilt)  -  ±||/  —  A  llia.ii,  i  — 

and  the  sign  of  the  error  alternates  from  point  to  point.  Further¬ 
more,  the  approximating  function  h(z)  is  unique. 

In  other  words  there  exists  exactly  one  linear  combination  of  the  X 
Chebyshev  basis  functions  which  has  at  least  .V  +  1  points  of 
equiripple  (but  alternately  signed)  error  for  a  given  function  fix), 
and  it  is  this  linear  combination  that  forms  the  unique  minimax 
approximation  to  fix).  If  the  basis  functions  do  not  form  a 
Chebyshev  basis,  however,  the  minimax  error  curve  need  not  be 
equiripple. 

The  Kernes  exchange  algorithm  [3]  employs  the  equal  oscillation 
error  of  the  minimax  approximation  to  compute  the  constants 
«*!>•■  •,«w.  The  algorithm  is  iterative  and  has  been  shown  to  converge 
under  very  general  conditions. 


-131- 


SUCCINCT  PAFEBS 


III.  THE  SUFFICIENCY  THEOREM 

As  stated  in  the  introduction,  every  linear  array  considered  is 
assumed  to  be  symmetric  and  to  have  symmetrically  tapered  element 
currents  about  the  array's  center.  If  the  center  of  an  M  element 
linear  array  is  chosen  as  the  origin  of  the  coordinate  system,  then  the 
field  pattern,  aa  a  function  of  the  angle  measured  from  a  normal  to 
the  array,  is  proportional  to  the  absolute  value  of 


X  wavelength  of  design  frequency 

Xi  distance  of  »'th  element  (counted  from  the  ce  r  of  the  array) 
a,  current  of  the  elements  at  x,  (if  iff  is  odd,  <*i  is  half  the  current) . 

Putting  u  **  x  sin  8  and  restricting  8  to  0  <  8  <  x/2  to  utilize 
symmetry,  the  field  pattern  is  proportional  to  the  absolute  value  of 

v 

Ptu)  =  23  «■  cos  0  <  ti  <  w  (1) 

i-l 

where{,  —  i,/(X/2),  t  ••  I,*  • -,;V.  WealwayshaveO  <  fi  <{,<••■  < 
f k-  For  the  field  pattern  P(v)  defined  in  (1),  we  define  the  sidelobe 
level  on  the  interval  Cui>*')  to  be  the  norm  ||  P(u)  f|  i,|...),  where 
«i  >  0  is  the  first  null  of  Ptu).  We  define  the  sidelobe  ratio  on  the 
same  interval  to  be  the  ratio  |  P( 0)  |  -i-  ||  Ptu)  ||(,|.,).  Note  that 
both  these  terms  are  in  linear  units.  Also,  the  symbol  for  the  half- 
open  interval  [a, It)  is  interpreted  to  mean  the  closed  interval 
[a, 6  —  «],  where  <  is  some  preselected  small  positive  number.  We 
now  state  and  prove  the  main  result. 

Sufficiency  Theorem 
Suppose  that  the  functions 

cos  ({in),"-. cos  ({.vn)  (2) 

form  a  Chebyshev  basis  on  the  interval  [0,x),  and  that  the  functions 
cos  ({,«),••■, cos  ({.V-lU)  (3) 

form  a  Chebyshev  basis  on  the  interval  [ui,v)  for  some  real  number 
Uo,  0  <  v,  <  x.  Then  there  is  an  angle  0  <  0,  <  x/2,  such  that 
for  any  specified  beamwidth  8,  8i  <  8  <  ir/2,  there  exists  a  unique 
optimum  field  pattern.  This  optimum  field  pattern  will  have  equal 
amplitude  sidelobes. 

Proof:  dince  the  functions  (3)  form  a  Chebyshev  basis  on  the 
interval  [u0,x),  there  must  exist  a  unique  minimax  approximation 
to  the  function  —cos  ({.vu)  from  this  basis;  that  is,  there  exist 
constants  m,- • -,a.v-i  such  that 

-cos  (Ku  *  a,  cos  flU  +  •  •  •  +  a.v-1  COS  f.v-i«. 

Thus  if  e>  is  the  magnitude  of  the  maximum  error  committed  by  this 
uniform  approximation,  then  the  function 

/(u)  ™  or,  COS  (,U  +  +  OJV-I  COS  {.V-lU  +  COS  fvll 

must  oscillate  about  the  zero  function  in  the  interval  [n0,r)  with  the 
magnitude  of  the  oscillation  no  greater  than  eo  and  with  at  least 
(A’  -  1)  +  1  m  ,V  points  where  the  oscillation  is  exactly  *>.  Let  hi 
be  the  first  zero  of  /(u)  greater  than  n*  and  let  8,  -  sin*1  (tq/x). 
We  claim  that  /(«)  constitutes  the  optimum  field  pattern  for  a 
beamwidth  to  the  first  null  of  ft.  For  if  this  pattern  is  not  optimum, 
there  exists  another  field  pattern  function 

gM  -  g,  cos  bu  -f  •  •  •  +  (3.V.I  cos  $V_|U  +  gN  cos  (vn 

such  that  g(0)  -  /( 0),  Ui  is  the  smallest  positive  root  of  ff(u),  and  g 


lit 

has  a  strictly  smaller  sidelobe  level  on  the  interval  [«i,x).  Since /m) 
has  at  least  N  points  of  maximum  error  on  [u»,x),  /( u)  has  at  least 
A  —  1  points  of  maximum  error  on  [«i,») .  It  is  clear  that  any 
function  which  is  constrained  to  agree  with  /  at  «  -  which  is 
everywhere  strictly  less  than  /  on  [ui,v),  must  intersect  /  in  at  least 
A  —  1  points  in  the  interval  [«i,v).  Additionally,  /(O')  -  g(0),  so 
that  /and  g  must  agree  with  at  least  .V  points  in  [0,» ) .  However, 
/(u)  —  g(u)  is  then  a  linear  combination  of  the  functions  (2), 
which  has  at  least  A  zeros  on  [0,x),  contradicting  the  definition  <>f 
a  Chebyshev  basts  unless  /  -  g.  Thus  /is  the  unique  optimum  field 
pattern  for  'jeamwidth  of  ft. 

To  complete  the  proof  of  the  theorem,  we  need  to  demonstrate 
that  for  each  angle  8  >  ft,  there  exists  a  number  d,,  »•  <  fl»  <  x, 
such  that  the  function 

/(«)  =  Ti  cos  +  •  ■  •  +  7.V-I  cos  |v-l»  +  cos  (\n 

has  tli  -  x  sin  8  as  its  first  real  root  greater  than  or  equal  to  C,, 
where  ££Ti'  7>  cos  {,»  is  the  uniform  approximation  to  -cos  (N» 
on  [flt.r).  Since  [do,x)  C  ["»,»),  the  collection  of  functions  '3> 
must  form  a  Chebyshev  basis  on  all  intervals  [d«,x),  so  that  the 
function  /(u)  is  well-defined  for  each  do.  Also  do  >  ><«  implies  that 
di  >  «i  («i  as  defined  earlier)  since  otherwise  the  l>eam  pattern  for 
a  beamwidth  of  8,  is  not  optimum.  Finally,  as  «„  is  varied  con¬ 
tinuously,  the  constants  ■«,•••, y.v.i  vary  continuously,  so  that  (he 
first  real  zero  greater  than  do  varies  continuously.  Since  do  may  be 
taken  as  close  to  v  as  desired,  it  must  be  the  case  that  for  some  ii„ 
the  first  real  zero  of  /(«)  greater  than  do  is  equal  to  x  sin  8. 


IV.  DOLPH-CH  FRY  SH  F.V  SHADIN' OR  AS  A  SPECIAL  CASE 

As  mentioned  in  the  introduction,  the  Dolph-Chebyshev  shadings 
are  designed  specifically  for  a  half-wavelength  equi'paced  line  ai  m\ 
We  consider  here  only  the  case  of  2Ar,  A'  >  1,  elements  in  the  array. 
For  an  odd  nnmber  of  elements,  the  arguments  arc  essentially  un¬ 
changed.  Counting  from  the  center  of  the  array,  the  position  r.  of 
the  zth  element  is 


_  (*LzS\  * 

r'  ~  l  2  )  2  ’ 


where  X  is  the  wavelength  of  the  design  frequency.  Then 
_  2x,  (2 i  -  1) 

~  x  2 

so  that  from  (1),  the  field  pattern  is  proportional  to  be  absolute 
value  of 


P(u)  **  £a,eos 


where  o,  is  the  element  current  of  the  tth  element  from  the  ccntei 
of  the  array.  The  Itolph- Chebyshev  coefficients  are  determined  for 
each  specified  beamwidth  greater  than  zero.  To  apply  the  theorem 
of  Section  III,  it  is  necessary  to  show  that  the  .V  functions 


and  the  ,V  —  1  functions 


(A  (3  \  /2A_-_1  \ 

\2/  ’  C°S  \2  "/  ’*  * °S  \  2  ") 
inctions 

M  (i  \  /2A-3  \ 

\V  \2  )'  \  2  ) 


fonn  Chebyshev  bases  on  the  intervals  [<),x)  and  f ««,»),  wlierc 
iii  =  0  here.  Consider,  then,  any  linear  combination  of  the  func 
lions  (4) 

fM  ”  £  0,  cos  i  0  <  u  <  x 

-  (cos2-) 


-132- 


114  IEEE  TRANSACTIONS  ON  ANTENNAS  AND  PROPAGATION,  JANUARY  197.j 


TABLE  I 

Optimum  Element  Currents  for  Field  Patterns  Given  in 
Fig.  1  with  a  Comparison  to  Optimised  Equispaced  Arrays 


“o 

.40138 

. S0138 

. 66138 

. 78138 

.  88138 

Element  Current* 

'i 

. 66524 

1.40005 

2.60994 

4.23857 

5.99725 

°2 

.56748 

1.  14597 

2.03999 

3. 16892 

4.32041 

“3 

.41913 

.79211 

1.30817 

1. 88695 

2.41420 

“4 

.23500 

.39087 

.54719 

.66452 

.73742 

“5 

1.00000 

1.00000 

1.00000 

1.00000 

1.00000 

Side-Lobe  Level  (dB) 

-9.92 

-14.  91 

-20. 11 

-25. 12 

-29.50 

Beamwidth  (deg) 

9.72 

11. 68 

13.73 

15.74 

17.50 

Beamwidth  (deg) 
for  10 -element, 
half-wavelength, 
equispaced  array  with 
Dolph-Chebyshev  for 
the  same  side-lobe 
levels. 

9.69 

11.57 

13.62 

15. 65 

17.49 

where  Tti-i  is  the  Chebyshev  polynomial  of  degree  2 i  —  1.  Put 
u  »  2  cos"1  (x).  Then  0  <  x  <  1  and 

f(u)  mg(x)  =  ._,(*) 

i-l 

so  that  f(x )  is  a  polynomial  of  degree  at  most  2*V  —  1.  Thus  g(z) 
can  have  at  most  2JV  —  1  zeros  in  any  interval.  Furthermore,  g(x) 
is  an  odd  function  and  so  can  have  at  most  IV  —  1  zeros  in  the  interval 
(0,1],  so  that/(u)  can  have  at  most  N  —  1  zeros  in  [0,r).  Hence 
the  functions  (4)  form  a  Chebyshev  basis  on  the  interval  [0,r). 
Replacing  N  by  N  —  1  in  this  argument  shows  that  the  functions  (S) 
also  form  the  required  basis. 

By  the  theorem  of  Section  III,  for  each  specified  beam  width, 
t  >  0,  there  exists  a  unique  optimum  field  pattern  and  a  unique  set 
of  element  currents.  These  currents  are  the  Dolph-Chebyshev 
coefficients  for  the  beamwidth  *.  It  should  be  pointed  out  that, 
if  0  <  0  <  w/(4N  —  2),  the  sidelobes  have  larger  amplitude  than 
the  main  lobe.  The  next  section  shows  how  the  Remes  exchange 
algorithm  may  he  used  as  an  alternative  means  of  calculating  the 
Dolph-Chebyshev  coefficients,  although  the  usual  methods  of  cal¬ 
culation  of  these  coefficients  are  preferable  to  this  method. 

V.  EXAMPLE 


are  not  either.  The  example  presented  here  shows  the  utility  of  the 
approach  even  when  the  hypotheses  of  the  theorem  of  Section  III 
do  not  apply. 

Ma  £4]  describes  what  is  essentially  the  Remes  exchange  algorithm 
and  applies  it  to  the  synthesis  of  nonuniformly  spaced  arrays.  How¬ 
ever,  Ma  seeks  approximations  to  the  function /(u)  —  exp  (— Au1), 
where  A  is  a  positive  real  number,  so  that  the  element  currents 
obtained  are  only  approximately  optimum.  To  find  the  optimum 
element  currents,  proceed  as  in  the  proof  of  the  theorem  in  Section  III 
to  find  a  minimax  approximation  to  —cos  (£»u)  in  the  form 

4 

-coe  (foil)  *  £  cos  (£»u)  (6) 

»-i 

on  some  interval  Cu«,v),  u.  >  0.  The  error  curve  of  this  approxima¬ 
tion  over  the  full  interval  [0,*>  is  identically  the  optimum  beam 
pattern  for  the  beamwidth  determined  by  the  first  null.  The  param¬ 
eter  u<  alone  controls  the  tradeoff  between  the  sidelobe  level  and  the 
beamwidth.  Therefore,  ti.  is  varied  systematically  here. 

To  begin,  the  Dolph-Chebyshev  coefficients  corresponding  to 
a  sidelobe  level  of  R  -  — 10  dB  were  used  as  the  initial  guess  for 
on,  at,  at,  and  a,,  so  that  the  choice 


u#  ->  2  coe'1 


*  0.40138 


The  example  chosen  is  a  ten-element  linear  array  with  the  elements 
located  at  positions  proportional  to  the  abscissas  of  a  ten-point 
Gaussian  quadrature  formula: 

{,  -  0.68788 
(t  -  2.00253 
b  -  3.13926 
(,  -  3.99708 
I.  -  4.50000. 

The  length  of  this  array  is  the  same  as  that  of  a  ten-element  half¬ 
wavelength  equispaced  array,  but  the  element  positions  are  sub¬ 
stantially  displaced  from  equal  spacing.  An  effort  to  verify  that  the 
functions  (2)  and  (3)  form  Chebyshev  bases  in  this  case  was  un¬ 
successful,  and  direct  numerical  verification  was  not  attempted. 
Instead,  the  Remes  exchange  algorithm  was  employed  immediately 
to  find  the  element  currents,  and  the  observed  behavior  of  the 
algorithm  itself  was  used  to  make  inferences  about  the  functions  (2) 
and  (3).  In  this  case,  it  will  be  seen  that  the  functions  (2)  are  not, 
in  fact,  a  Chebyshev  basis,  and  that  it  is  likely  that  the  functions  (3) 


was  made,  where 

*.  -  ^  jjr  +  (r*  -  l)1*!  "  +  -  <r»  -  l)'*pj 

with  L  -  M  —  1  “  9  and  r  «  10"*.  With  this  initial  guess  on  this 
interval,  the  Rones  exchange  algorithm  computed  the  minimax 
approximation  (6)  in  two  iterations  and  produced  a  result  shown 
in  Table  I.  To  continue  the  procedure,  u<  was  incremented  by  0.01, 
and  the  Remes  algorithm  employed  again  using  these  newly  com¬ 
puted  coefficients  as  the  initial  guess  on  this  smaller  interval.  Con¬ 
vergence  occurred  in  two  iterations,  the  beamwidth  increased 
slightly,  and  the  sidelobe  level  reduced  to  — 10.3  dB.  Continuing  in 
this  fashion,  u,  was  systematically  increased  from  0.40138  to  1.02138. 
Representative  beam  patterns  appear  in  Fig  1  and  the  corresponding 
element  currents  in  Table  I.  Notice  that  the  beam  widths  attainable 
by  the  Dolph-Chebyshev  current  amplitudes  for  an  equispaced 
array  are  remarkably  dose  to  those  obtained  in  this  example. 

By  inspecting  the  beam  patterns  with  the  three  lowest  sidelobe 
levels,  it  is  seen  that  each  of  these  beam  patterns  possesses  5  zeros. 


-133- 


SUCCINCT  PAM** 


tl.'l 


1*1  S.  Karlin  and  W.  J.  Studdaa.  Tchebpsheff  Susttms:  With  Applications 
'  In  Analysis  and  Statistics.  New  York:  Wiley,  loss. 

13 1  T.  J,  Rivlln,  An  Introduction  to  the  Approrimatton  of  Functions. 
Waltham.  Maaa.:  Blaisdell.  I960. 

14/  M.  T.  Ma,  "Another  method  of  ayntheeliina  non  uniformly  spaced 
arrays."  IEEE  Trans.  Antennas  Propapat.  (Commun.).  vof.  AP-13. 
pp.  833-834.  Sept.  1883. 


Fig.  1 .  Optimum  Held  patterns  (or  ten-element  symmetrically  positioned 
and  unequally  spaced  linear  array. 


Since  each  o(  these  beam  patterns  is  also  identically  the  error  curve 
of  a  minimax  approximation  (6)  on  an  interval  Cu«,v),  it  must  be 
concluded  that  the  functions  (2)  do  not  form  a  Chebyshev  basis  on 
the  interval  [0,s).  Consequently,  the  element  currents  may  not  be 
unique. 

For  each  iteration  of  the  Remes  exchange  algorithm,  the  solution 
of  a  system  of  linear  equations  is  required.  If  there  are  N  —  1 
Chebyshev  basis  elements,  then  one  equation  in  IV  unknowns  is 
established  for  each  point  where  equiripple  error  should  occur.  By 
the  theorem  in  Section  II,  there  must  be  at  least  (JV  —  1)  +  1  —  AT 
points  of  equiripple  error.  In  the  five  results  given  for  the  present 
example,  there  are  exactly  IV  -  5  points  of  equiripple  error,  counting 
one  point  on  the  main  lobe  down  at  the  sidelobe  level  (at  u  -  u«), 
so  that  unexpected  numerical  difficulties  do  not  occur.  To  proceed 
further  than  these  results  requires  the  solution  of  six  equations  in 
five  unknowns,  because  of  the  growth  of  the  extra  lobe  at  t  -  90*. 
The  straightforward  procedure  of  solving  any  five  of  these  six 
equations  proved  unsatisfactory  because  erratic  behavior  developed 
in  the  sidelobe  corresponding  to  the  equation  deleted.  An  attempt 
to  solve  all  six  equations  in  both  the  least  squares  sense  and  the 
least  maximum  error  sense  by  employing  the  generalised  inverse  of 
the  coefficient  matrix  also  proved  unsatisfactory.  It  would  seem, 
then,  that  either  numerical  difficulties  are  the  cause  of  the  problem 
or  that  the  functions  (3)  do  not  form  a  Chebyshev  basis.  The  author 
favors  the  latter  possibility. 

It  should  be  noted  that  the  Dolph-Chebyshev  element  currents 
for  both  a  10-element  and  a  80-element  half-wavelength  equispaced 
array  have  been  computed  in  the  aforementioned  manner  without  dif¬ 
ficulty  from  - 10  dB  to  over  —70  dB.  (In  these  cases,  an  extra  side 
lobe  at  90*  never  develops.)  Unequally  spaced  arrays  with  as  many 
as  80  dements  have  also  been  successfully  treated  by  this  method. 

All  calculations  were  performed  in  double  precision  on  the  Univac 
1108.  Total  CPU  time,  including  plot  generation  for  the  example 
given  was  67  s,  although  a  more  carefully  written  program  oould 
have  reduced  this  time  by  at  least  a  factor  of  two.  A  total  of  63  sets 
of  current  amplitudes  were  computed. 

VI.  SUMMARY 

Sufficient  conditions  for  the  existence  of  optimum  field  patterns 
for  symmetrically  spaced  and  amplitude  tapered  linear  arrays  have 
been  proved.  The  theorem  proved  is  a  generalisation  of  the  work  of 
Dolph  on  half-wavelength  spaced  linear  arrays.  A  well-known 
algorithm  from  approximation  theory  has  been  employed  in  an 
example  to  compute  the  dement  currents  corresponding  to  the 
optimum  beam  patterns  using  only  the  given  element  spacings 
themadves. 


REFERENCES 


(1|  C.  L.  Dolph,  "A  current  distribution  of  broadside  arrays  which 
optimises  the  relationship  between  beam  width  and  aide-lobe  level." 
Proc.  IRE  Water  Electrons,  vol.  84,  pp.  333-348.  June  IMS. 


-134- 


Optimized  Symmetric  Discrete  Line  Arrays 


R.  L.  Streit 


-135- 


860 


Itll  1RASSM  IIONS  ()S  ANIISNAS  ANI)  1‘KUI'AC.  AT  (OS.  SC»V|  MSI  M  1975 


Optimized  Symmetric  Discrete  Line  Arrays 
ROY  L.  STRLIT 

Abstract — A  generalization  of  Dolph's  method  for  the  synthesis  of 
discrete  antenna  arrays  is  applied  to  six  different  symmetric  line  arrays. 
Based  on  these  examples,  it  is  concluded  that  I)  the  Held  patterns  of 
optimized  symmetric  line  arrays  with  the  same  number  of  elements  and 
with  the  same  aperture  are  virtually  indistinguishable  and  2)  optimized 
arrays  with  an  odd  number  of  elements  are  substantially  belter,  in  general, 
than  arrays  with  an  even  number  of  elements. 

I.  Introduction 

If  all  the  elements  in  a  linear  array  are  equally  spaced  at  a  half 
wavelength,  then  Dolph's  method  (I  ]  may  be  used  to  compute 
the  element  currents  of  optimum  field  patterns  for  any  specified 
beamwidth  and  for  any  number  of  array  elements.  Optimized 
cquispaced  arrays  have  two  striking  characteristics:  I)  as  the 
specified  beamwidth  is  increased,  the  corresponding  sidclobe 
level  diminishes  and  2)  all  the  sidelobes  are  of  equal  amplitude. 
In  the  generalization  of  Dolph's  method,  both  I)  the  tradeoff 
between  the  beamwidth  and  the  sidclobe  level  and  2)  the  equal 
amplitude  sidclobe  structure  are  extended  to  a  larger  class  of 
symmetric  arrays. 

By  application  of  the  generalized  Dolph  method  to  six  specific 
arrays,  it  was  noticed  that  the  method  was  more  successful  for 
arrays  with  an  odd  number  of  elements  than  arrays  with  an  even 
number  of  elements.  All  symmetric  arrays  with  the  same  odd 
number  of  elements  and  the  same  aperture  as  a  half-wave 
cquispaced  array  seem  to  possess  optimum  field  patterns  with 
equal  amplitude  sidelobes  for  any  specified  beamwidth.  Only 
very  special  even  numbered  symmetric  arrays  appear  to  share 
this  feature;  that  is,  for  nearly  all  even  arrays,  element  currents 
which  suppress  uniformly  all  sidelobes  do  not  appear  to  exist 
for  every  desired  beamwidth.  Equispaced  even  numbered  arrays 
seem  to  be  the  primary  exception  to  this  statement. 

A  second  observation  based  on  these  examples  is  this:  for 
fixed  aperture,  odd  number  of  elements,  and  desired  beamwidth. 
any  symmetric  set  of  positions  is  nearly  as  good  as  any  other 
symmetric  set  provided  the  clement  currents  are  optimized  for 
the  given  positions.  To  find  unequally  spaced  arrays  with  field 
patterns  substantially  better  than  optimized  equispaced  arrays 
thus  requires  either  a  larger  aperture  or  more  elements.  If  an 
even  number  of  elements  is  specified,  these  observations  ate 
false:  equal  spacing  is  definitely  better,  in  general,  because  of 
the  remarks  in  the  preceding  paragraph. 

The  directional  response  of  a  symmetric  line  array  of  M 
elements  is  directly  proportional  to  the  absolute  value  of  a 
linear  combination  of  cosines: 

N 

P(u)  =  y  fl»  cos  ({*«),  0  <  u  <  n 

k  « i 

where 


f  =  -ii- 
(-1/2) 

u  =  it  sin  6 


Manuscript  received  March  2.  1975;  revised  June  12,  1975. 

The  author  is  with  the  Naval  Underwater  Systems  Center.  New  London 
Laboratory.  New  London.  Conn.  06320. 


THflA  tfeq) 

Fig  I  Field  patterns  for  clement  currents  in  Tables  I  and  II 

and  0  is  measured  from  a  normal  to  the  line  of  the  array.  2  is  the 
wavelength  of  the  design  frequency,  (x, !  arc  elemem  positions 
measured  from  the  array  center,  and  (a, !  are  element  currents. 
Thus,  if  M  is  odd.  the  center  element  must  lie  at  the  origin  and 
the  center  element  current  is  half  of  a, .  In  Fig.  I  the  field  patterns 
shown  are  20  log,0  \P(u).  normalized  by  its  maximum  absolute 
value  on  |0.it),  but  plotted  versus  the  angle  0  and  not  u. 

The  theorem  in  [2)  is  valid  for  both  even  and  odd  numbers  of 
elements.  For  the  odd  case,  however,  the  result  can  be 
strengthened.  In  the  following  theorem,  the  beamwidth  is 
measured  from  u  =  0  down  to  the  sidclobe  level.  Also,  by 
definition,  a  collection  of  continuous  functions  form  a  Chebyshev 
basis  on  a  finite  interval  if  the  zeros  of  any  nontrivial  linear 
combination  of  these  functions  number  at  most  one  less  than  the 
number  of  functions  in  the  basis. 

Theorem  ( Odd  Case):  If  M  is  odd,  then  the  function 

N-  I 

P(u)  m  cos  ({"it)  +  £  n.  cos  OS  «  <  i 

*  =  i 

is  the  unique  optimum  field  pattern  for  a  beamwidth  of 
arcsin  (u0/n)  provided 

i) 

w-  i 

max  | />(»)!  =  min  max  cos  (iNu)  +  £  «»cos(c.a) 

ii) 

(cos  ({,«).••  •,  cos({„_  ,n)l  is  a  Chebyshev  basis  on  l«0.ir). 


(cos  ({,«). •  •  •.  cos  (inti) I  is  a  Chebyshev  basis  on  [O.irl 


-137- 


COMMUNICATIONS 


861 


This  result  is  stronger  than  that  given  in  (2]  because  the  beam- 
width  is  given  as  an  explicit  function  of  u0.  This  is  possible 
because  Meinardus  has  shown  ( {3.  theorem  30])  that  assumptions 

ii)  and  iii)  together  imply  that  u0  must  be  an  extreme  point  of  the 
approximation  i).  Therefore.  u0  is  precisely  that  point  on  the 
main  lobe  which  is  down  at  the  sidelobe  level.  (It  can  also  be 
shown  that  the  first  zero  of  the  minimax  approximation  i)  is  a 
strictly  increasing  function  of  uB,  so  that  the  theorem  is  essentially 
unchanged  if  beamwidth  is  measured  to  the  first  null  as  in  [I  ] 
instead  of  down  to  the  sidelobe  level.  The  only  difference  is  that 
the  beamwidth  cannot  he  given  as  an  explicit  function  of  u0.  See 
(5J.) 

Optimum  field  patterns  (in  the  odd  case)  do  not  necessarily 
possess  exactly  \  -  I  equal  amplitude  sidetobes  nor  are  optimum 
field  patterns  necessarily  unique  unless  the  assumptions  ii)  and 

iii)  are  made.  The  assumption  ii)  is  well  known  ([3,  theorems 
19  and  20])  to  be  both  a  necessary  and  sufficient  condition  for 
the  existence  and  uniqueness  of  the  minimax  approximation  i). 
so  that  without  this  assumption  an  optimum  field  pattern  will 
not  be  unique  if  it  satisfies  i).  Furthermore,  Meinardus  ([3, 
theorem  30])  has  shown  that  the  assumptions  ii)  and  iii)  together 
imply  the  existence  of  exactly  (V  extreme  points  of  the  minimax 
approximation  i).  Since  one  of  these  extreme  points  must  be  on 
the  main  lobe  itself,  and  every  other  extreme  point  corresponds 
to  a  sidelobe.  without  both  these  assumptions  an  optimum  field 
pattern  satisfying  i)  need  not  possess  precisely  \  -  I  sidelobes. 

A  nearly  identical  result  holds  for  an  even  number  of  elements, 
but  an  explicit  relation  between  beamwidth  and  the  parameter  u0 
is  not  available.  Meinardus'  result  applies  only  to  Chebyshev 
systems  containing  the  constant  I  as  a  basis  function,  a  circum¬ 
stance  which  occurs  for  symmetric  line  arrays  for  odd  M  only. 
Thus,  the  strongest  statement  possible  is  that  the  beamwidth  is 
perhaps  (slightly)  smaller  than  arcsin  (u0/ir). 

These  results  can  be  extended  in  two  directions.  First,  the 
synthesis  of  steered  arrays  can  be  performed  by  computing  the 
approximation  i)  on  intervals  extending  as  far  beyond  n  as 
desired.  Secondly,  the  fact  that  ii)  and  iii)  arc  cosine  bases  is 
never  used  in  the  proof  of  the  results.  Ma  (4.  p.  2 IS]  gives  an 
example  of  concentric  continuous  rings  whose  field  pattern  is  a 
linear  combination  of  basis  functions  of  the  form  /„(<•«). 
Therefore,  the  theorem  above  also  yields  an  optimality  result  for 
this  case. 

II.  Application  to  Arrays  With  Odd  Number  of  Elements 

Three  25-element  arrays  called  herein  (for  convenience) 
Random.  Dolph,  and  Gauss  are  considered.  The  Dolph  array  is 
cquispaced.  In  the  Random  array,  all  the  element  positions 
except  the  first  and  the  last  are  displaced  strictly  toward  the 
origin  from  equal  spacings  by  no  more  than  0.S,  but  otherwise  in 
a  random  fashion.  The  Gauss  array  has  elements  located  at 
positions  proportional  to  a  25-point  Gaussian  quadrature 
formula.  Thus  the  Gauss  array  is  substantially  more  perturbed 
from  equal  spacing  than  is  the  Random  array.  All  three  arrays 
have  the  same  aperture. 

The  Dolph  positions  satisfy  the  theorem  statement  (see  (2]). 
However,  for  reasons  that  will  be  stated  later,  the  Random  array 
does  not  satisfy  condition  iii),  whereas  the  Gauss  array  does  not 
satisfy  condition  ii).  Therefore,  computational  difficulties  can  be 
expected  for  the  Gauss  array.  Also,  the  field  patterns  presented 
for  the  Gauss  array  may  not  be  optimum. 

The  application  of  the  method  to  the  Random  and  Gauss 
arrays  is  detailed  in  Tables  I  and  II,  respectively,  and  follows 
the  development  in  [2],  Since  in  each  case  the  proper  choice  of 
the  subintervals  (v0,it)could  not  be  made  beforehand,  anarbitrary 


TABU  I 

Optimum  Llimini  (.  urrints  ton  25-i.iiMi.NT  Random  Arrai 


Element 

Kisition 

“o  131 

“o 

“o  3,1 

“o  434 

.  000 

.031 

.  136 

.427 

1.165 

.777 

.250 

1.1MI 

3.  140 

8.313 

1.930 

.  122 

.  500 

1.  503 

3. 964 

2.  590 

.  193 

.775 

2.277 

5.  895 

3.8G7 

.  226 

.689 

2.  555 

6. 440 

4.849 

.  11" 

.411 

1.223 

2.966 

5.  695 

.  i»:. 

.721 

1.939 

4.567 

C.M71 

.158 

.549 

1.380 

3.035 

7.  531 

.116 

.393 

.963 

2.051 

8.675 

.206 

.644 

1.437 

2.785 

9.941 

.  146 

.420 

.844 

1.457 

10. 780 

.  125 

.329 

.594 

.911 

12.000 

1.000 

1.000 

1.000 

1.000 

Sidelobe 

-9.998 

-20. 062 

-29.906 

-39.950 

level(dB) 

Beamwidth 

2.  759 

4.541 

6.  236 

7.936 

<de© 

TABLE  II 

Optimum  Elemint  Currents  for  25-Element  Gauss  Array 


Element 

Position 

“o  ,5' 

”o  •»' 

u  .341 

0 

u  .  441 

0 

.000 

.263 

1.  153 

5.  858 

-58.734 

1.46) 

.259 

1.423 

5.  693 

-58.571 

2.939 

.  249 

1.337 

5.  231 

-50.580 

4.  353 

.234 

1.  20" 

4.550 

-42. 064 

5.  701 

.213 

1.049 

3.755 

-32.  639 

C.  963 

.  189 

.87" 

2. 950 

-23.741 

8.  119 

.  !  .4 

.709 

2.213 

-16. 252 

9.  152 

.  137 

.554 

1.596 

-10. 636 

10.  046 

.  113 

.415 

1.092 

-6. 289 

10. 788 

.078 

.  306 

.769 

-4.874 

11.366 

.099 

.  193 

.364 

.359 

11.772 

-.  106 

.  132 

.474 

-C. 678 

12.000 

1.000 

1.000 

1.000 

I. OOO 

Sidelobe 

-9.994 

-20. 189 

-29.507 

-39.371 

level  (dB) 

Beamwidth 

2.756 

4.587 

6.236 

8.074 

(4e*l 

surfing  point  of  u0  =  0.1  was  picked  and  the  R ernes  exchange 
algorithm  employed  to  compute  the  minimax  approximation  i) 
on  the  subinterva)  (0.1.x).  Then  ua  was  incremented  by  0.01  and 
the  Rcmcs  exchange  algorithm  was  used  again  on  the  slightly 
smaller  interval  [0.1 1, it).  Continuing  in  this  fashion  gave  a 
family  of  optimum  element  currents.  Four  sets  of  element 
currents  for  both  arrays  are  given  in  Tables  I  and  II.  Notice 
that  the  bcamwidth-sidclobe  level  tradeoff  is  as  expected  in  both 
cases.  Also,  the  element  currents  appear  to  be  continuous 
functions  of  the  sidelobe  parameter  u0  except  in  the  Gauss  array. 
For  the  Gauss  array,  nonuniqueness  is  the  consequence  of 
violating  the  second  condition  of  the  theorem. 

The  accuracy  of  the  results  can  be  estimated  by  comparing 
similar  results  for  the  Dolph  array  with  those  that  can  be  obtained 
explicitly.  It  was  found  that  the  currents  were  correct  to  4  or  5 
significant  decimals.  More  accuracy  was  not  attained  because 
a  discrete  version,  and  not  a  continuous  version,  of  the  Remes 
exchange  algorithm  was  implemented  in  the  computer  program. 

The  field  patterns  for  the  element  currents  in  Tables  I  and  II 
are  shown  in  Fig.  I.  Even  though  the  element  currents  and 
positions  are  quite  dissimiliar,  all  three  sets  of  field  patterns  are 


-138- 


x 


(62 

nearly  identical  and  there  is  very  little  difference  in  the  beam- 
widths  to  be  had  for  the  same  sidclobe  level.  However,  it  can  be 
seen  that  the  Random  array  is  slightly  superior  to  the  Dolph 
array.  To  the  author's  knowledge,  this  is  the  first  explicit  example 
of  an  optimum  unequally  spaced  array  which  can  be  shown  to 
be  superior  to  an  optimum  equispaccd  array  of  exactly  the  same 
length  and  the  same  number  of  elements.  Practically  speaking, 
however,  the  field  patterns  are  virtually  identical  in  all  three 
cases. 

The  Random  array  does  not  satisfy  condition  iii)  because  the 
field  pattern  for  -40  dB  has  13  zeros.  Since  the  field  pattern  is  a 
linear  combination  of  the  13  functions  in  iii),  the  Chebyshev 
condition  fails.  That  the  Gauss  array  fails  to  satisfy  condition 
ii)  is  not  as  straightforward  It  can  be  shown  thai  the  smaller  the 
interval  qf  approximation,  the  less  the  error  of  the  minimax 
approximation  on  that  interval.  The  four  given  approximations 
(Table  II)  satisfy  the  requirements  of  Chebyshev’s  theorem 
([3.  theorem  23])  and  therefore  should  be  minimax  approxima¬ 
tions;  however,  the  error  of  these  approximations  does  not 
decrease  with  increasing  «0.  Hence,  the  Gauss  array  cannot 
satisfy  condition  ii). 

III.  Application  to  Arrays  With  Even  Number  oe  Elements 

The  chief  difference,  numerically,  between  even  numbered 
arrays  and  odd  numbered  arrays  is  that  the  field  pattern  function 
Hu)  does  not  contain  the  constant  I  as  a  basis  function  because 
no  element  lies  at  the  array  center.  Mcinardus'  result  does  not 
apply,  and  the  problem  that  develops  in  the  even  case  is  that  one 
too  many  sidclobcs  develops  so  that  not  all  of  them  can  be 
suppressed. 

The  Rcmcs  exchange  algorithm  establishes  one  linear  algebraic 
equation  foi  each  sidclobe  and  one  equation  corresponding  to  a 
point  on  the  mainlobc  down  at  the  sidclobe  level.  Because  of 
Chebyshev's  theorem,  there  must  be  at  least  N  equations  in 
exactly  N  unknowns  when  the  minimax  approximation  i)  has 
been  found.  For  arrays  with  an  odd  number  of  elements,  there 
is  never  any  difficulty  because  there  are  always  exactly  N  equa¬ 
tions  in  N  unknowns  ([3,  theorem  30]).  The  difficulty  with  even 
arrays,  then,  is  that  eventually  it  develops  that  N  +  I  equations 
in  N  unknowns  must  be  solved  exactly,  and  these  equations  prove 
to  be  inconsistent. 

Three  arrays  which  arc  the  24-element  analogs  of  the  same 
named  25-element  arrays  were  considered.  The  Dolph  array 
satisfied  the  theorem  statement,  the  Random  array  did  not 
satisfy  condition  iii),  and  the  Gauss  array  did  not  satisfy  either 
condition  ii)  or  iii).  The  application  of  the  Remes  exchange 
algorithm  to  these  three  arrays  proceeded  exactly  as  in  Section 
III,  and  field  patterns  with  sidclobe  levels  of  -  10  dB,  -  15  dB, 
and  -  20  dB  were  obtained.  Too  many  sidclobcs  never  develop 
for  the  Dolph  array  because  of  the  fact  that  Hu)  must  be  identic¬ 
ally  zero  at  «  =  r  (i.e.,  0  =  90").  The  other  two  arrays  did 
not  have  this  feature,  so  that  inevitably  a  sidclobe  appeared  at 
u  —  it.  For  small  u0.  this  sidclobe  either  did  not  exist  or  it  was 
small,  but  for  larger  u0.  the  sidclobe  at  u  =  it  appeared  and 
increased  in  size  until  it  equaled  in  magnitude  the  other  sidclobcs. 
At  this  point,  the  difficulty  of  the  overdetermined  system  of 
equations  developed.  In  consequence,  the  minimax  approxima¬ 
tion  i)  does  not  change  until  u0  has  increased  to  the  point  that 
the  first  sidelobe  is  included  as  part  of  the  main  lobe.  The  extra 
equation  can  then  be  dropped  and  further  reduction  of  the 
remaining  sidclobcs  is  then  possible.  Dropping  the  first  equation 
is  equivalent  to  losing  control  of  the  first  sidelobe.  The  remaining 


IEEE  TRANSACTIONS  ON  ANTkNNAS  ANI»  t’KOI’Ai.Vl  ION,  NOV  I  MB!  K  1 


sidelobes  may  well  be  reduced  in  amplitude,  bui  only  ai  t lie- 
expense  of  a  bad  first  lobe.  Field  patterns  with  all  the  Milclnhcs 
at,  say,  -30  dB  were  not  computed  because  no  such  Held 
patterns  existed  for  the  Kandorn  and  Gauss  arrays  Since  this 
procedure  is  unavoidable,  the  conclusion  must  be  that  the 
equispaccd  Dolph  array  is  much  superior  to  either  of  the  other 
two  arrays.  For  as  long  as  the  extra  sidclobe  was  not  a  problem, 
the  three  arrays  were  indistinguishable  from  the  bcamwidth 
sidelobe  level  tradeoff  standpoint. 

The  development  of  the  extra  sidelobe  seems  inevitable  in  the 
general  even  case.  Since  the  extra  sidclobe  prevents  a  uniform 
suppression  of  all  sidclobcs,  it  follows  that  optimized  arrays 
with  an  odd  number  of  elements  have  a  substantial!)  better 
character  than  optimized  arrays  with  an  even  number  of  elements. 

IV.  Summakv 

A  generalization  of  Dolph 's  method  is  applied  to  six  different 
arrays.  Based  on  these  examples,  it  is  concluded  that  optimized 
equispaccd  arrays  arc  as  good  as  any  other  optimized  symmetric 
line  array  with  the  same  number  of  elements  and  aperture 
Another  conclusion  is  that  atTays  with  an  odd  number  of 
elements  have  better  behavior  than  arrays  with  an  even  number  of 
elements 


REFERENCES 

flj  C  L  Dolph.  "A  current  distribution  ol  broadside  arravs  which 
opnmi/cs  the  relationship  between  bcamwidth  and  stdeiohe  level." 
Pro, .  IRI.  Mairv  llrilron  .  vol  *4.  pp  tlS  *48.  June  1946 
|2|  R.  L  Siren.  "Sullicicnt  conditions  lor  the  existence  of  optimum  beam 
patterns  for  unequally  spaced  linear  arravs  with  an  example."  ///./. 
Tram  Antennas  Propanol.,  vol  AP-2J.  pp  112  115.  Jan  l *#75 
(J|  <i.  Metnardus.  Appru.vmiufmn  of  hunt  lions  Theory  and  Suntcrual 
Methods.  New  York:  Springer- Vcrlaj:.  |9(.7. 

(4J  M.  T.  Ma.  Theory  and  Application  of  Antenna  Arrow  New  York: 
Wde> -Intefscicncc.  1974. 

(5)  R.  L.  Siren.  “Extremals  and  zeros  m  Markov  systems  are  monotone 
functions  of  one  end  point."  in  Pro,  Coni,  on  the  Theory  of  Approx¬ 
imation.  Calgary.  Alta..  C  anada.  1975.  and  in  Theory  of  Approximation 
New  York:  Academic,  to  appear. 


-139- 


Real  Excitation  Coefficients  Suffice 
For  Sidelobe  Control  In  Linear  Array 

J.  T.  Lewis  and  R.  L.  Streit 


-141- 


I 


1262 


IEEE  TRANSACTIONS  ON  ANTENNAS  AND  PROPAGATION.  VOL.  AP-30,  NO.  «.  NOVEMBER  1902 


Real  Excitation  Coefficients  Suffice  for  Sidelobe 
Control  in  a  Linear  Array 

JAMES  T.  LEWIS  AND  ROY  L.  STRE1T 


Abstract — Minima*  design  of  a  linear  anlenna  array  with  arbitrary 
fixed  elements  leads  to  tlw  following  mathematical  problem. 

minimize  max  |  Ttu)  | 
w*  complex  uq<|u|<u| 

subject  to  TlO)  =  I  where  Ttu)  -  1  h*  exp  (-«/*«)  and  dk  art  real 

numbers.  It  is  proven  that  this  problem  has  a  solution  with  real 
excitation  coefficients  k*.  In  the  antenna  application  this  shows  that 
there  is  no  need  to  allow  phasing  at  the  individual  elements  of  the 
array;  amplitude  control  alone  will  achieve  all  the  sidelobe  reduction 
possible.  An  analogous  result  can  be  proved  for  a  more  general 
complex  approximation  problem. 


We  consider  a  linear  antenna  array  with  /V  omnidirectional 
elements  located  at  arbitrary  fixed  positions  {**}  receiving 
a  plane  wave  of  wavelength  X  from  the  direction  8„,  -it  12  < 
8a  <  rr/2,  relative  to  a  normal  to  the  array.  If  the  array 
is  steered  to  look  in  the  direction  8 /,  -tr/2  <  0/  <  rr/2,  then 
the  complex  transfer  function  of  the  beamformer  is  given  by 
N 

T{u)  =  2  wk  exp  (~idku) 

*=  i 

where  {w*}  are  the  element  excitation  coefficients,  dk  = 
2nxk/\,  and  u  =  sin  8„  -  sin  0,.  The  coefficients  w*  may 
be  complex  in  general  The  peak  response  should  occur  at  u  = 
0;  we  make  the  usual  normalization 
N 

71(0)  =£  wk  =  I. 

*=t 

To  effect  small  sidelobes  we  wish  to  minimize  lf(u)l  for 
lu  1  >  u0  where  u0  >  0  is  chosen  small. 

The  total  range  of  u  depends  on  the  look  direction  0/. 
First,  let  us  consider  only  the  case  of  the  array  steered  broad¬ 
side  Thus  6/  =  0  and  the  range  of  u  becomes  - 1  <  u  <  I 
corresponding  to  - 1  <  sin  8a  <  I  for  -rr /2<  6a  <  rr/2.  Hence 
the  problem  of  selecting  excitation  coefficients  to  effect  mini¬ 
mum  overall  sidelobe  level  becomes  a  minimax  problem, 
minimize 

w*  complex  max  |  71 (u)  I 

Uq<|uI<  t 

subject  to 

N 

7X0)=2*vt=l.  (1) 

*=i 

The  case  where  the  array  is  steered  through  the  same  number 
of  degrees  either  side  of  broadside  is  very  similar  mathemati¬ 
cally  to  the  case  8/  =  0  and  is  discussed  below. 


Manuscript  received  October  30,  1981;  revised  January  4,  1982. 

J.  T.  Lewis  was  with  the  Naval  Underwater  Systems  Center,  New 
London  Laboratory,  New  London,  CT  06320  during  the  summer  of 
1981,  on  leave  from  the  Department  of  Mathematics,  University  of 
Rhode  Island,  Kingston,  R1  02881. 

R.  L.  Streit  is  with  the  Naval  Underwater  Systems  Center,  New  Lon¬ 
don  Laboratory,  New  London,  CT  06320. 


A  standard  argument  shows  that  a  solution  to  problem  ( I ) 
exists;  however,  it  may  not  be  unique  In  general  the  excita¬ 
tion  coefficients  w*  are  allowed  to  be  complex;  we  now  prove 
that  a  solution  of  (1)  exists  with  tv*  all  real.  First,  denoting 
complex  conjugates  by  an  overbar, 


max 

UQ<  I  u  1 


2  w*«""‘“ 

I 


max 

uo<iui<  i 
max 

U0<[M|<1 


2  w*e 


4-1 

N 


2  "*‘idkU 


kc  1 


max 

I4Q  <  I  M  l<  1 


N 


2 

**t 


The  last  equality  follows  from  the  fact  that  u0  <  l-u  l<  1  if 
and  only  if  uo  <  lu  I  <  1;  i  t.,  the  range  of  u  is  symmetric 
about  u  =  0.  Now 

I  N 

max  2  (Re  wk)e~'dkU 

HO  <1  U  i<  i  |*=, 

N  | 

=  max  2~(”4  +"4)'*d*“ 
u0«iui«i  *‘ T,  2 


max 

^  ^  up  <  i  u  ■ 


N 


2  W4*' 
*=  1 


idku 


2  **'""*" 

k~  I 

wke  >dkU  |  from  *bove- 

This  guarantees  the  existence  of  a  real  solution  of  problem 
( 1 )  as  asserted,  since 

N  N  S 

2  >v*  =  1  implies  J)  (Re  tv*)  =  Re  2  tv*  =  1 

4* I  4=1  4=1 

We  now  note  that,  since  1 7T  — u)  I  =  l7Tv)l  =  l7Xu)l 
when  tv|,  — ,  wN  are  real,  we  can  further  simplify  problem 
(1)  to 

minimize 

tv*  real  max  I  T(u)  |  (la) 

H0<lul<t 

subject  to 

N 

r(o)=2  "4  =  1 

4=1 

Hence,  we  can  find  a  solution  to  problem  (I)  by  solving 
the  easier  problem  (la).  This  has  important  practical  implica¬ 
tions  for  the  design  of  an  antenna  array.  It  indicates  that 
there  is  no  need  to  allow  phasing  at  the  individual  elements; 
amplitude  of  excitation  alone  will  achieve  all  the  sidelobe 
reduction  that  is  possible. 

The  above  analysis  was  for  the  look  angle  0/  =  0.  Now 
let  us  regard  8i  as  not  being  fixed;  then  the  range  of  u  becomes 
-2  <  u  <  2  The  problem  corresponding  to  ( 1 )  with 8 ,  bounded 


+  A  max 
h0<ImK1 

I  N 

s  max  2 
uq  <la»Kl  l**i 


i 


-143- 


IEEE  TRANSACTIONS  ON  ANTENNAS  AND  PROPAGATION.  VOL.  AP-30,  NO.  6,  NOVEMBER  1982 

away  from  endfire  is 
minimize 

wk  max  ir(n)| 
u0<lu|<2  -u  | 

subject  to 

N 

rto)=2»-;  =  i.  (2) 

As  above,  we  can  show  the  existence  of  a  solution  of  (2)  with 
real  excitation  coefficients 

Now,  let  us  consider  a  more  general  complex  approxima¬ 
tion  problem.  Let  /,  h, ,  •••,  hN  be  continuous  complex  valued 
functions  defined  on  a  closed  and  bounded  set  Q  in  the  com¬ 
plex  plane.  ( Q  can  be  finite  or  infinite.)  The  minimax  approxi¬ 
mation  problem  is 

minimize 

N 

a*  max  /(z)  -  2)  akhk(z)  (3) 

Here  the  ak  are  allowed  to  be  complex.  If  for  all  z  €  Q,  f(z)  = 

/(*).  h*(i)  =  hk(z),  k  =  I,  JV  and  Q  is  symmetric  with  re¬ 
spect  to  the  real  axis,  i.e.,  q  in  Q  if  and  only  if  q  in  Q,  then  a 
solution  of  (3)  with  real  coefficients  exists.  We  omit  the  de¬ 
tails  of  the  verification. 

Finally,  we  note  that  real  excitation  coefficients  are  not 
adequate  for  every  use  of  a  linear  array  or  for  every  pattern 
desired.  For  example,  if  a  null  is  required  in  the  pattern  at  a 
point  u  #  0,  the  equation  T(u)  =  0  would  be  added  to  prob¬ 
lem  ( 1 ).  However,  now  a  solution  with  real  coefficients  would 
not  necessarily  exist. 


Abstract 


Two  different  constructive  techniques  for  approximating 
positive  definite  functions  by  means  of  finite  exponential  sums  are 
explored.  One  technique  constructs  the  coefficients  and  the 
exponents.  The  other  technique  constructs  the  exponents  when  the 
coefficients  are  all  required  to  be  equal.  Both  approximation 
techniques  appear  to  be  suitable  for  numerical  computation.  The 
techniques  extend  to  completely  monotonic  functions  as  well. 
Error  bounds  are  proved  using  elementary  methods. 

In  an  application,  these  error  bounds  can  be  used  to 
eliminate  some  of  the  effort  and  guesswork  previously  necessary 
in  two  procedures  for  the  design  and  synthesis  of  sparse  broadband 
linear  arrays. 


-147- 


TR  6357 


Two  Exponential  Approximation  Methods 


I.  Introduction 

Two  design  procedures  for  aperiodic,  or  space  tapered,  linear  arrays  are  in¬ 
vestigated  in  this  report  in  a  setting  much  more  general  than  the  usual  setting.  One 
procedure,  due  to  Bruce  and  Unz  [1],  gives  both  element  excitations  (“shadings”) 
and  positions.  The  other  procedure,  due  to  Maffett  [2],  gives  element  positions 
under  the  condition  that  all  excitations  are  unity.  Both  seek  desirable  radiation 
patterns  minimizing  grating  lobes.  These  methods  synthesize  sparse  broadband 
arrays  that  are  less  sensitive  to  frequency  changes  than  periodic  (equispaced)  arrays. 

Using  either  of  these  procedures,  the  designer  must  guess  the  number  of 
elements  required,  perform  the  appropriate  numerical  computations,  examine  the 
resulting  radiation  pattern,  and  then  decide  if  more  elements  are  required  or  if  fewer 
elements  will  suffice.  In  this  report,  error  bounds  are  derived  that  provide  estimates 
on  the  number  of  elements  necessary  for  a  given  degree  of  approximation  of  the 
desired  radiation  pattern.  Thus,  some  of  the  effort  and  guesswork  inherent  in  these 
procedures  can  be  eliminated. 

Neither  of  these  two  methods  is  intrinsically  limited  to  aperiodic  array  design. 
Generalizations  turn  out  to  be  worthwhile  and  of  independent  interest.  Therefore, 
this  report  addresses  only  the  general  setting  from  this  point  on. 

A  complex  valued  function  f  of  a  real  variable  is  defined  to  be  positive 
definite  if  and  only  if,  for  each  integer  n>l ,  the  inequality 

Z  f(Xj-Xj)  aj  a  j  >0  (1.1) 

i.j  =  1 

holds  for  all  x(,  ....  xn  tR  (the  real  numbers)  and  a,,  .  .  .  ,  an  tC  (the  complex 
numbers).  Bochner’s  Theorem  states  the  following:  If  f  is  a  continuous  function  on 
R,  then  f  is  positive  definite  if,  and  only  if,  there  exists  a  bounded  non-decreasing 
function  V  on  R  such  that  f  is  the  Fourier-Stieltjes  transform  of  V ;  that  is, 

f(x)  =  f  eioxdV(o),  xcR  .  (1.2) 

00 

The  recent  paper  of  Stewart  [3]  gives  references  to  various  proofs  of  Bochner’s 
Theorem  and  its  generalizations.  We  point  out,  for  future  use,  that  (1.2)  im¬ 
mediately  implies  that  the  total  variation  of  V  equals  f(0),  and  for  all  real  x,  f(-x) 
equals  the  complex  conjugate  of  f(x).  Goldberg  [4]  proves  that  any  positive  definite 
function  f  is  such  that  |f(x)|<f(0)  for  all  real  x. 

In  this  report,  we  restrict  our  attention,  for  the  most  part,  to  continuous 
positive  definite  functions  f  on  R  that  can  be  written 

f(x)  =  J  ell,xdV(a),  xtR,  (1.3) 


1 


-149- 


TR  6357 


for  some  real  number  A  such  that  0  <  A  <  In  other  words,  we  have  assumed  that 
V(a)  is  constant  for  |a|  >  A,  For  functions  satisfying  (1.3),  we  develop  in  an 
elementary  manner  an  approximation  to  f(x)  of  the  form 

n 

Sn(x)  =  f(0)X  ak  e'^'  ,  (1.4) 

k  =  t 

where  |ak|  <  A,  k  =  1 . n.  We  give  an  error  bound  for  this  approximation  in 

Theorem  1 .  This  approximation  always  gives  positive  coefficients  and  exponents 
akCR  that  are  located  at  the  roots  of  an  appropriate  orthogonal  polynomial.  We 
suspect  that  these  approximations  are  near-optimal  in  some  well-defined  sense.  (See 
Schabach  [5,  p.  1018]  for  a  relevant  conjecture  about  a  particular  function  f.) 

Under  various  additional  assumptions  concerning  V,  we  develop  an  ap¬ 
proximation  to  f(x)  of  tht  form 

QnM=~lii  eiok*  .  (1.5) 

where  |ok|  <  A,  k  =  1,  .  .  .  n,  and  we  give  an  error  bound  in  Theorem  2.  The  ap¬ 
proximation  Qn(x)  cannot  be  as  efficient  in  general  as  the  approximation  Sn(x); 
however,  Qn(x)  has  the  advantage  of  being  much  more  easily  constructed  in  practice 
for  almost  any  reasonable  n  (say,  n  <  106). 

Note  that  both  the  approximations  Sn(x)  and  Qn(x)  are  readily  written  in  the 
form  (1 .3)  and,  therefore,  are  positive  definite.  Hence,  we  must  have 

|f(x)-Sn(x)|  <  | f(x)|  +  |S„(x)|  <  2f(0),  xeR  ,  (1.6) 

since  Sn(0)  =  f(0).  Similarly,  it  is  always  the  case  that  |f(x)  -  Q„(x)|  <  2  f(0). 

It  will  be  shown  that  Prony’s  method  can  be  used  to  compute  Sn(x).  Although 
Prony’s  method  in  this  problem  must  become  numerically  ill-conditioned  for  n 
sufficiently  large,  it  may  nonetheless  be  useful  for  small  n  (say,  n  <S10). 
Numerically  stable  methods  for  computing  Sn(x)  suitable  for  all  n  would  require  an 
algorithm  other  than  Prony’s  method.  This  is  discussed  at  the  end  of  Section  II. 

The  computation  of  approximations  Qn(x)  is  shown  to  depend  upon  the  ability 
to  compute  the  numerical  value  of  the  inverse  function  of  V  (guaranteed  to  exist  by 
additional  assumptions)  at  specific  points.  The  level  of  difficulty  involved  depends 
on  V,  of  course,  but  the  interval  is  finite,  so  the  problem  seems  to  encounter  no 
inherent  numerical  difficulties. 

An  excellent  bibliography  of  references  to  the  literature  on  exponential  ap¬ 
proximation  is  contained  in  [6]. 

Note  that  if  V  in  (1.3)  is  continuously  differentiable  on  the  interval  (-A,  A),  then 
V’(ff)  ^  0,  and 

/■* 

f(x)  =  J  e1®' V'(a)  da  .  (1.7) 

-A 

From  the  Paley-Wiener  Theorem  (see,  e.g.,  [7,  p.  134]),  this  equation  uniquely 
extends  the  domain  of  f  to  all  C  and  that  this  extension  of  f  is  an  entire  function 
of  exponential  type  at  most  A. 


2 


-ISO- 


TR  6357 


We  close  this  section  with  a  small  collection  of  positive  definite  functions. 
According  to  [3],  Schoenberg  proved  that  fr(x)  =  exp[-|x|r]  is  positive  definite  if 
and  only  if  0  <  r  <  2,  and  Polya  proved  that  any  real,  even,  continuous  function  f 
that  is  convex  on  the  interval  (0,  oo),  that  is,  f((x  +  y)/2)  <  (f(x)  + f(y))/2,  and 
satisfies  limv  _<»  f(x)  =  0,  is  positive  definite.  Goldberg  [4,  p.  61)  proves  that  if  f(x) 
is  positive  definite  and  a  >  0,  then  the  function  h(x)  =  f(x)  exp  (-ax2)  is  also  positive 
definite.  Finally,  if  the  function  f  has  a  Fourier  transform  that  is  nonnegative  and 
integrable,  then  the  function  f  is  positive  definite.  Specific  examples  of  functions 
satisfying  this  latter  property  are 


sin  Ax 

X 

1  r* 

■  =  y-J  e,ax  da  , 

1  —A 

(1.8) 

2(1 -cos  Ax)  : 
Ax- 

=  f*eio'(l  -  — )  da 
—A  A 

(1-9) 

£-|\l  = 

>  r  e,OX  da 
n  l+a- 

(110) 

e-ax-  = - 1 -  | 

sTfoi  J 

r°° 

ei axe-a-/4ada  (a  >  0) 

-OO 

(Ml) 

Jv(Ax) 

(21)- 

(A2 -  a*)v~/l  eiox da  (v  >  -Vi) 

-A 

xv 

n|/2f(v+  Vi) 

(1.12) 

where  Jw(x)  is  the  usual  Bessel  function  of  order  v.  A  final  example,  one  that  finds 
application  in  antenna  design  ([8],  [9],  and  [10]),  is 

;A 

ei«xdV(a)  ,  a^O,  (1.13) 
-a 


where  V(-A)  =  -1/2A, 
V(«)  = 


i,(^-Vam2) 

(A2-t2)* 


dt, 


-A<  a<  A  , 


V(+A)  =  1/2A  +  lim0_A-  V(a),  and  where  1  ,(x)  is  the  modified  Bessel  function  of 
order  one.  This  function  is  interesting  because,  for  |x[  >  a/A,  it  has  magnitude  not 
exceeding  1,  while  for  |x|  <  a/A,  it  exhibits  very  rapid  growth  achieving  a  maximum 
magnitude  of  cosh(a)  at  x  =  0.  Other  examples  can  be  discerned  in  various  tables  of 
integral  transforms,  such  as  (11). 


-151- 


3/4 

Reverse  Blank 


II.  Exponential  Approximation  With 
Arbitrary  Coefficients 


TR  6357 


The  idea  developed  in  this  section  for  constructing  approximations  of  the  form 
Sn(x)  is  simply  Gaussian  quadrature.  A  glance  at  equation  (1.3)  reveals  that  we  are 
particularly  interested  in  Gaussian  quadrature  with  respect  to  the  measure  dV(o). 
From  Szego  [12,  p.  25],  a  system  of  orthogonal  polynomials  exists  for  the  measure 
dV(o)  if  V(o)  has  infinitely  many  points  of  increase  in  the  interval  [-A ,  -A]  and  if  the 
moments 


c 


m 


dV(o)  ,  m  =  0,1,2 _ 


(2.1) 


exist.  Since  V  is  bounded  above,  the  moments  cm  certainly  exist.  If  V  has  Finitely 
many  points  of  increase,  then  f  can  be  written  explicitly  as  a  finite  sum  of  ex¬ 
ponentials.  Although  this  special  case  is  not  uninteresting  (in  the  context  of 
economizing  large  finite  exponential  sums),  we  will  avoid  it  by  assuming  that  V  has 
infinitely  many  points  of  increase. 


Let  a, . an  be  the  abscissas  and  let  b, . bn  be  the  corresponding  Cotes 

numbers  of  the  n-th  order  Gaussian  quadrature  formula  with  respect  to  the  measure 
dV(a).  Since 

I  bk  =  /  I  dV(o)  =  f(0)  , 

k  =  1  -A 

we  rewrite  the  Cotes  numbers  in  the  form  bk  =  ak  f(0),  k  =  1,  .  .  .  ,  n.  Using  this 
notation,  and  applying  the  quadrature  formula  blindly  to  (1.3)  gives  the  ap¬ 
proximation 


where 

r  Sn(x)  =  f(0)  21 1  ak  e'«k*  , 

(2.2) 

0<ak<l,k  =  l,...,n 

(2.3) 

a,  +  .  .  .  +  a„  =  1 

(2.4) 

|ok|  <  A,  k  =  1 . n  . 

(2.5) 

These  three  properties  are  immediate  consequences  of  well  known  results  on 
Gaussian  quadrature.  (See  Szego  [12,  pp.  47-49].)  Ir,  addition,  these  properties 
imply  that  Sn(x)  possesses  good  numerical  round-off  error  behavior  when  the  sum  is 
evaluated  numerically. 

We  seek  an  error  bound  such  that 


|f(x)  -  Sn(x)|  <  Rn(x)  ,  xsR  .  (2.6) 

It  is  clear  from  (1.3)  and  the  Riemann-Lebesgue  Lemma  that  f(x)-*0  as  x-*°°.  On 
the  other  hand,  it  is  not  hard  to  see  from  (2.2)  that  Sn(x)  cannot  tend  to  zero  as 
x-*°°.  The  most  that  can  be  expected  is  that  Rn(x)  becomes  “small”  for  any  fixed  x. 
We  will  show  that  as  n-*°°,  Sn  converges  to  f  uniformly  on  any  finite  real  interval. 


5 


-152- 


TR  6357 


Let  n  >  1 .  For  each  xrR,  let 

£n(x)  =  min  max  !  es«  -  rt2n.,(or)|  .  (2.7) 

where  the  minimum  is  taken  over  ail  polynomials  n2n.,(a)  of  degree  at  most  2n-l 
with  complex  coefficients.  We  always  have  tn(x)  <  1  for  all  x,  as  can  be  seen  by 
considering  the  case  n2n_,(a)  =  0  in  (2.7). 

Lemma  1.  For  n  >  1, 

.„(*)<  N/2  .  xeR  . 

22"-1  (2n)!  (2.8) 

Proof.  From  a  theorem  given  in  (13,  p.78),  for  any  real  valued  function  p(o) 
defined  on  the  interval  [-1,  +  1J  and  possessing  n+1  continuous  derivatives  on 
(-1,  +  1),  we  have 

lD(n  +  ■>(£)! 

E„(p)  s  mm  max  |  p(o)  -  nn(a)  |  =  - 

-Ka<l  2"(n  +  1)! 

for  some  4,  - 1  <  4  <  +  1 ,  where  the  minimum  is  taken  over  all  real  polynomials  itn  of 
degree  at  most  n.  For  p(o)  =  cos  aXx  defined  for  a  in  the  interval  (-1,  1]  and  for 
fixed  real  numbers  k  and  x,  we  have 


Ein-i  (P)  = 


(Ax)2n 

22n~'(2n)! 


[cos  4Axl 


<  (Ax)2n  . 
2?n-'(2n)! 


For  q(a)  =  sin  akx  on  [-1, 1  ],  we  have  similarly 


Ejn-I  (q)  < 


(Ax)2n 

22n-‘(2n)! 


From  the  definition  of  £n(x),  we  have 

£n(x)  =  min  max|e"'i'‘-n2n.1(Aff)| 

-K<KI 


<  {E|n_,(p)  +  E|n.,(q)}^  . 

Substituting  the  estimates  for  E2n_i(p)  and  E2„-i(q)  completes  the  proof. 

We  remark,  but  do  not  prove,  that  an  example  in  Meinardus  [13,  p.  96J  can  be 
extended  and  used  to  show  that  for  fixed  x, 

£„(x)  <  —  (1  +  o(D)  ,n-»<®  . 

2-°-,(2n)! 

6 


-1S3- 


TR  6357 


It  seems  reasonable  to  conjecture  that  this  asymptotic  inequality  is  actually  an 
asymptotic  equality.  In  any  event,  we  use  only  (2.8)  in  this  report. 

Theorem  1.  Let  f(x)  be  a  continuous  complex  valued  positive  definite  function  of  a 
real  variable  such  that 

f(x)  =  J  e10'  dV(o)  ,  xeR,  (2.9) 

-i 

where  V  is  a  bounded  non-decreasing  function  having  infinitely  many  points  of 
increase  in  the  finite  closed  interval  [-A,  A].  Then,  for  each  integer  n  >  1,  there  exists 

distinct  real  numbers  o,,  .  .  .  ,  on  and  real  numbers  a, . an  satisfying  (2.3), 

(2.4),  and  (2.5)  and  the  additional  condition 

|f(x)  -  S  (x)|  <  Vl  f(0)  ■  *X£R.  (21°) 

2  (2n)! 


where  Sn(x)  is  given  by  (2.2).  Furthermore,  the  left-hand  side  of  (2.10)  is  never 
larger  than  2f(0)  for  all  xeR  and  every  integer  n  >  1 . 

Proof.  Let  be  the  distinct  abscissas  of  an  n  point  Gaussian  quadrature 

formula  with  respect  to  the  measure  dV(o),  and  let  b, . bn  be  the  correspond¬ 

ing  Cotes  numbers.  Let  the  numbers  a,,  .  .  .  ,  an  be  defined  by  the  relationship 
bk  =  f(0)  ak,  k  =  1 ,  .  .  .  ,  n.  Equations  (2.3),  (2.4),  and  (2.5)  are  then  satisfied.  Fix 
xeR.  Let  p*(a)  be  any  polynomial  of  degree  at  most  2n-I  such  that 

max  |ei«-p*(a)|  =  e„(x)  , 

-A<«<A 

where  En(x)  is  defined  by  (2.7).  Then,  defining  S„(x)  as  in  (2.2),  we  have 
|f(x)  -  Sn(x)|  <|f(x)  -  f(0)  1  ak  p*(ok)| 

k  - 1 

+  |f(0)Z  ak  p*(ok)  -  Sn(x)| 

k  =  1 

f * 

<  J  |eIBX  -  p*(<r)|dV(o) 

-A 

+  f(0)Z  ak  |p*(ok)  -  e*"k*| 

k  =  I 

<£„(X)  f  dV(a)  +  f(0)En(x)2!  ak 

-A  k  =  I 


=  2f(0)  e„(x)  . 


7 


-154- 


TR  6357 


Since  £n(x)  <  1  is  always  true,  recalling  Lemma  1  completes  the  proof. 

Corollary  l.l.  Any  sequence  of  approximations  Sn(x),  n  =  l,2 . satisfying 

Theorem  1  converges  uniformly  to  f(x)  on  every  finite  interval. 

Proof.  Immediate. 

Corollary  1.2.  If,  in  addition  to  the  requirements  of  Theorem  1 ,  f(x)  is  real  valued, 
then  for  each  integer  n  ^  1,  there  exists  distinct  real  numbers  0n  and  real 

numbers  d . .  dn  that  satisfy 

0<dk<  1  ,  k  =  1 . n  (2.11) 

d,  +  d2  +  .  .  .  +  dn  =  1  (2.12) 

0  <  /?k  <  A  ,  k  =  1 . n  (2.13) 

and  are  such  that 

|f(x)-f(0)i  dlco,M<f(0)2-^7,  M 

for  all  xeR.  Furthermore,  the  left-hand  side  of  (2.14)  is  bounded  from  above  by 
2f(0)  for  all  xeR  and  every  integer  n  >  1 . 

Proof.  Since  f(x)  is  real  valued,  by  conjugating  (1.1)  we  see  that  it  must  be  even. 
From  f(x)  =  (f(x)  +  f(-x))/2  and  (2.9),  we  get 

f(x)  =  J  cosaxdV(o)  .  (2.15) 

-A 

Furthermore,  the  measure  dV(a)  can  be  taken  to  be  symmetric  about  0.  For  each 
n  >  1 ,  and  for  each  fixed  xeR,  define 

?n(x)  =  min  max  |cos  ax  -  n2n.|(o)|  , 


where  the  minimum  is  taken  over  all  polynomials  n2n_,(a)  degree  at  most  2n-l 
with  real  coefficients.  Hence,  we  always  have  ?n(x)  <  1  by  considering  the  case 
n2n_|(a)  s  o.  From  the  proof  of  Lemma  1, 


(Ax)2n 
22n_1  (2n)! 


,  xeR  . 


Duplicating  the  proof  of  Theorem  1  with  2n  replacing  n  gives 

2n 

|f(x)  -  f(0)2  ak  cos  akx|  <  2f(0)?'2n(x) 

k  =  I 

for  the  distinct  real  numbers  a,,  .  .  .  ,  a2n  and  real  numbers  a, . a2n  that  are 

the  abscissas  and  Cotes  numbers,  respectively,  of  the  Gaussian  quadrature  of  order 
2n  with  respect  to  the  measure  dV(o).  These  abscissas  and  Cotes  numbers  satisfy 


8 


-155- 


TR  6357 


(2.3),  (2.4),  and  (2.5).  Since  the  measure  dV(o)  is  symmetric  about  zero,  it  must  be 
that  o,  =  -o;n,  a,  =  etc.,  and  that  a,  =  a,n,  a,  =  a,n_2,  etc.  Inequality 

(2.14)  follows  immediately  by  taking  dk  =  2a(1  +  k  and|)k  =  an  +  k  for  k  =  1, 2,  .  .  .  , 
n.  The  properties  (2.1 1),  (2.12),  and  (2.13)  follow  from  (2.3),  (2.4),  and  (2.5).  This 
completes  the  proof. 


Example  1.  The  real  valued  function 


f(x)  =  2 


sin  x 


=  f  eioxdV(o) 

-i 


(2.16) 


with  V(a)  =  a,  -1  <  a  <  1,  is  a  positive  definite  function  on  R.  In  this  case, 
Gaussian  quadrature  with  respect  to  the  measure  dV(o)  is  Gauss-Legendre 
quadrature.  Thus,  from  the  proof  of  Corollary  1 .2,  for  each  n  >  1 ,  we  have 


sin  x 


n  | 

2dkcos  pkx  I 


*4n 


24n'2(4n)! 


(2.17) 


where  pn  are  the  positive  abscissas  of  a  2n  point  Gauss-Legendre 

quadrature  and  d ,,...,  dn  are  the  corresponding  Cotes  numbers.  This  example 
and  some  computation  provides  one  test  of  the  quality  of  the  error  term.  Let  Rn(x) 
be  the  smaller  of  the  two  numbers  2f(0)  =  2  and 

x4"  =/xe\4n  rr 

24n'2(4n)!  y  nn  '  (2.18) 

From  Table  1,  it  appears  that  Rn(x)  is  an  excellent  error  bound  provided 
|x|  «  8n/e.  (Table  1  was  computed  on  a  DEC  VAX  11/780  on  which  the  double 
precision  unit  round-off  error  is  only  4  x  10*17.) 


Table  1.  Comparison  of  (2.17)  for  n=  10  (8n/e  ~  29.43) 


X 

Rio(x) 

f(x)  -  S,0(x) 

max|f(y)-S10(y)|, 

(Ky<* 

5 

.407x1 0-2 1 

underflow 

underflow 

10 

.447  x  10-'9 

underflow 

underflow 

15 

.494  x  10-'2 

.491  XIO-'3 

.491  x!0-'3 

20 

.491  x  10-7 

,164x  10-8 

.164x10-8 

25 

.370x10-3 

.290x10-5 

.290x10-5 

30 

.543 

.664  x  10-3 

.664  x  10-3 

35 

.200x10’ 

.302x  10-' 

.302  x  10-' 

40 

.200  x  101 

.309 

.309 

45 

.200x10' 

.501 

.569 

50 

.200x10* 

-.364 

.569 

This  section  is  concluded  by  showing  that  Prony’s  method  (see,  e.g.,  [14,  p. 
378]  or  [15,  p.  340]  )  can  be  used  to  compute  numerically  the  approximations  of 
Theorem  1.  We  need  only  find  the  Gaussian  quadrature  formula  of  order  2n  with 
respect  to  the  measure  dV(a),  which  is  equivalent  to  solving  the  equations 


9 


-156- 


TR  6357 


rk  -c- 

c  =  j  amdV(fl)  =  l  bk  a[",  m  =  0,1 . 2n-l  (2.19) 

J-k  i-i 

for  b|t  ....  bn  and  a, . an.  Since  ok  must  be  real,  write  sk  =  In  ok,  if  ak  #  0, 

and  sk  =  0  if  ok  =  0.  The  required  equations  can  now  be  written  as 

n 

Cm  =  l  bkemH  ,  m  =  0,1,  .  .  .  ,2n-l  . 

k  =  I 

This  form  is  precisely  the  form  required  for  Prony’s  method.  (The  use  of  Prony’s 
method  to  compute  Gaussian  quadrature  formulas  was  pointed  out  to  the  author  by 
Marvin  J.  Goldstein.)  In  principle,  we  require  2n  quadratures,  the  solutions  of  two 
systems  of  linear  equations  each  of  rank  n,  and  the  roots  of  a  polynomial  of  degree 
n  (in  this  case,  all  its  roots  are  known  to  be  real,  distinct,  and  have  multiplicity  one) 
to  compute  one  approximation  for  which  Theorem  1  holds.  Unfortunately,  it  is 
known  [16]  that  any  procedure  that  relies  upon  the  moments  must  become  in¬ 
creasingly  numerically  ill-conditioned  as  n  increases.  Fortunately,  the  use  of 
modified  moments  (i.e.,  replacing  the  crm  in  (2.19)  with  some  classical  system  of 
orthogonal  polynomials)  together  with  an  algorithm  other  than  Prony’s  method 
often  results  in  a  numerically  well-conditioned  problem  for  finite  intervals.  See  [17] 
and  [18]  for  details. 


10 


-157- 


TR  6357 


III.  Exponential  Approximation  With 
Uniform  Coefficients 

The  idea  developed  in  this  sectir  '  ro r  constructing  approximations  of  the  form 
Q„(x)  in  which  each  exponential  term  enters  the  approximation  with  equal  weight  is 
basically  probabilistic  in  nature.  The  integral  representation  (1.3)  of  f(x)  is  ap¬ 
proximated  by  a  Riemann  sum  whose  subintervals  are  equally  probable  according 
to  the  “probability”  measure  dV(a).  In  this  interpretation,  V(a)  is  a  cumulative 
probability  integral  that  is  used  to  transform  n  uniformly  distributed  points  in  the 
range  of  V(a)  into  n  abscissas  on  the  real  line  distributed  according  to  the  measure 
dV(a).  (See  [19,  p.  314]  or  (15,  p.  389].) 

Theorem  2.  Let  f(x)  be  a  continuous  complex  valued  positive  definite  function  of  a 
real  variable  such  that 

f(x)  =  f  elo'dV(o),  xrR,  (3.1) 

-A 

where  V  is  a  continuous  and  strictly  monotone  increasing  function  throughout  the 
finite  closed  interval  [—A,  A].  Then,  for  each  integer  n  >  1,  there  exists  distinct  real 
numbers  a, . an  in  the  open  interval  (-A,  A)  such  that 

|f(x)-Qn(x)|<2\/2  f(0)A|x|/n  ,  xrR  ,  (3.2) 


where 


Q„(x)  = 


f(0) 

n 


I 

k  =  I 


eiokx 


(3.3) 


Furthermore,  from  the  remark  following  (1.6),  the  left-hand  side  of  (3.2)  is 
bounded  from  above  by  2f(0)  for  all  xcR. 


Proof.  Let  the  real  number  x  be  fixed  throughout  this  proof.  Define,  for  k  = 
0,1,2,  .  .  .  ,  2n, 


uk  =  V(-A)  +  (V(A)  -  V(-A))  k/2n 
vk  =  V-1  (uk)  . 


(3.4) 


Under  the  hypotheses  on  V(a),  it  is  clear  that  V-'  exists  and  is  continuous  and 
strictly  monotone  on  the  closed  interval  [V(-A),  V(A)J.  Hence,  the  numbers  vk  in 
(3.4)  are  well  defined  and  are  distinct.  It  will  be  shown  that  inequality  (3.2)  holds  for 


ffk  V2k-I  «  ^  1 ,2,  ....  n  . 


(3.5) 


Since 


V(vk  +  |)- V(vk)  =  (V(A)-V(-A))/2n  =  f(0)/2n  ,k  =  0,1 . 2n-l  ,  (3.6) 

it  follows  from  the  definition  (3.3)  of  Qn(x)  that 


11 


-158- 


TR  6357 


Q„(x)  =  Z  f  k  exp(i  v2k_,x)dV(o)  .  (3.7) 

^  *  1  v2k_2 

By  the  Mean  Value  Theorem,  there  exists  £k  in  the  interval  between  a  and  v,k_,  such 
that 

cos  ax  -  cos  v2k_,x  =  -x(a-v2k_,)  sin  ikx  .  (3.8) 

Thus,  a  <  v2k  ,  implies  a  <  £k  <  v2k_,  and  v2k_,  <  a  implies  v2k_,  <  £k  <  a.  From  (3.1) 
and  (3.7), 

Re(f(x)  -  Qn(x))  =  Z  f  n  (cos  ax  -  cos  v2k_,  x)  dV(a) 

k  = 1  v2k-2 

=  -x£  f  *  (a-v2k.,)sinikx  dV(a)  (3.9) 

k  =  l  v2k-2 

and  so,  taking  absolute  values, 

1  Re( f(x)  -  Qn(x))|  <  |x|2  f  2k  1  a— v2k_ , |  dV(a) 

k  =  1  v2k-2 

<  |X|Z  <v2k-V2k-2)  <V<V2k)  -  V<v2k-2>  > 

k  *  I 


=  2A  |x|  f(0)/n  .  (3.10) 

where  (3.6)  was  used  in  the  last  step.  Similarly, 

|Im  (f(x)  -  Qn(x)  )|  <  2A  |x|  f(0)/n  .  (3.11) 

Clearly,  (3.10)  and  (3.1 1)  together  complete  the  proof. 

Corollary  2.1.  Any  sequence  of  approximations  Qn(x),  n  =  1,2,  ...  ,  satisfying 
Theorem  2  converges  uniformly  to  f(x)  on  every  finite  interval. 

Proof.  Immediate. 

Corollary  2.2.  If,  in  addition  to  the  requirements  of  Theorem  2,  f(x)  is  real  valued, 
then  for  each  integer  n  >  1 ,  there  exist  distinct  real  numbers  in  the  open 

interval  (0,A)  such  that 

|f(x)  -  t  cos/Jkx|  <  f(0)A|x|/n  .  (3  J2) 

Furthermore,  the  left  hand  side  of  (3.12)  is  never  larger  than  2f(0)  for  all  xrR  and 
integer  n  >  1 . 

Proof.  Recalling  (2.15),  follow  the  proof  of  Theorem  2  with  2n  replacing  n 
throughout.  Half  of  the  resulting  2n  ak’s  are  positive.  Set  the  /)k’s  equal  to  the 
positive  ak’s.  The  details  are  immediate.  This  concludes  the  proof. 


-159- 


TR  6357 


The  proof  of  Theorem  2  requires  that  V  be  continuous  and  strictly  monotone 
increasing.  It  is  not  clear  whether  the  hypotheses  on  V  can  be  weakened.  On  the 
other  hand,  the  convergence  rate  of  the  approximations  Qn  can  apparently  be 
improved  by  making  further  assumptions  concerning  V.  In  general,  however,  better 
than  rr2  convergence  rates  cannot  be  expected.  Consider  Example  1,  where  V(a)=cr 
and  f(x)  =  x-'  sin  x.  From  the  construction  indicated  in  Corollary  2.2,  f(x)  is  ap¬ 
proximated  by 


Q„(x)  =  Z  cos  (2k-l)x/2n  = 


2n  sin  x/2n 


(3.13) 


It  can  be  shown  directly  that 


sin  x 
x 


-Qn<*> 


,xtR  , 


(3.14) 


and 


lim^n^x-1  sin  x  -  Qn(x)] 


-x  sin  x 
24 


(3.15) 


Hence,  in  this  example,  the  correct  convergence  rate  is  precisely  n~2  for  each  fixed  x. 

We  point  out  that  the  upper  bound  in  (3.14)  and  the  limit  (3.15)  follow  by  a  trivial 
application  of  a  suggestive  result  in  Polya-Szego  (20,  Pt.  2,  Ch.  1,  Pr.  11]. 

Regrettably,  their  method  seems  applicable  in  this  application  only  to  the  special 
measure  V(o)  =  a. 

A  further  example  seems  to  indicate  that  the  convergence  rate  of  the  ap¬ 
proximations  Qn  can  lie  between  n-<  and  n~2. 

Example  2.  [4,  p.  22]  Let  A  be  a  finite  positive  real  number.  Define  V  for  non¬ 
negative  arguments  a> 0  by 

„  %  ra(l-«/2A)  ,0<*<A, 

V(«)=\A/2  ,  A  <  a, 

and  for  negative  arguments  by  the  relation  V(-o)  =  -V(o),  a  >  0.  Thus,  V  is  an  odd 
function  whose  derivative  V’(a)  =  1  -  |a|/A.  Obviously,  for  all  x  #  0, 

„  ^  rA  j,„  ,  2(l-cosAx) 
f(x)  =  J  cos  a  x  dV(cr)  =  — 5 — — - -  ’ 

and  f(0)  =  A.  Now  V-1  exists  on  the  interval  [-A,  A]  and,  for  non-negative  . 
arguments,  is  given  by 

V-'(t)  =  A(1  -  Vl  -2t/A  )  ,  0  <  t  <  A/2  . 

From  the  construction  indicated  in  Corollary  2.2, 

Qn(x)  =  -J-  1  cos  [  A(  1  — \/l  -  (k  -  ‘A)/n  )x],  (3.16) 

n  k  =  I 

and 

13 


-160- 


TR  6357 


| f(x)  -  Qn(x)|  <  A-  |x|/n  ,  xtR  . 


This  estimate  is  not  even  close  to  the  truth.  In  fact,  an  examination  of  Table  2  in¬ 
dicates  that  for  sufficiently  large  n  the  best  error  bound  may  take  the  form 


jf(x)-Qn(x)|< 


Kx2 

(2n)3/- 


(3.17) 


for  some  constant  K.  In  general,  we  speculate  that  if  V-'  satisfies  a  Lipschitz 
condition  of  order  r,  0  <  r  <  1 ,  then  the  convergence  rate  is  of  order  1  /n1  +  r. 

Table  2.  Inequality  (3.17)  for  x  =  10,  X  =  1,K  =  10-2 


n 

Kx2/(2n)3'2 

f(x)  -  Q„(x) 

5 

.316x10-' 

-.618x10-’ 

10 

.112x10-’ 

-.117x  10’ 

20 

.395  x  10-2 

-.153  xlO-2 

40 

.140  x  10-2 

.649  x  104 

80 

.494  x  10-3 

.163  xlO4 

160 

.175  xlO-3 

.907  x  10-4 

320 

.618  x  10-4 

.400  x  10-4 

640 

.218  xlO-4 

.160  x  10-4 

1280 

.772X10-3 

.614x10-5 

2560 

.273X10-5 

.229x10-5 

5120 

.965  x  10*6 

.837  x  106 

10240 

.341  x  10-« 

.303  xlO6 

20480 

.121  xHH 

.109  xlO6 

40960 

.426  x  10-7 

.389x  10-’ 

14 


-161- 


TR  6357 


IV.  Concluding  Remarks 

The  proofs  in  this  report  depend  heavily  on  the  finite  support  of  the  measure 
dV(a)  even  though  the  construction  of  the  approximations  Sn(x)  and  Qn(x)  can  be 
carried  out  without  modification  on  infinite  intervals  as  well,  provided  V(&)  is 
bounded.  Since  these  proofs  cannot  be  adapted  for  infinite  intervals,  the  ef¬ 
fectiveness  of  the  resulting  approximations  theoretically  remains  an  open  question. 
Intuitively,  however,  it  would  seem  that  only  our  proofs  are  limited  and  that  the 
underlying  approximation  process  is  generally  valid. 

In  computational  practice  the  function  V  is  usually  unknown.  In  many  cases, 
however,  the  given  function  f  does  possess  a  nicely  behaved  Fourier  transform  from 
which  V  can  be  readily  constructed.  The  Fourier  transform  of  f  can,  of  course,  be 
computed  accurately  and  efficiently  in  many  situations  using  fast  Fourier  transform 
(FFT)  methods. 

If  in  the  above,  V  was  not  monotonic,  but  of  bounded  variation  on  R,  ex¬ 
ponential  approximations  can  be  constructed  as  follows.  In  this  case,  there  exist 
monotone  increasing  functions  V*  and  V-  such  that  V  =  V  +  -V‘.  For  each  A  >  0, 
define  the  “bandlimiting”  operator  B;  by 

Ba  f(x)  =  J  eia'dV(ff)  ,  xeR  . 

Let  I  be  any  finite  interval.  Given  i  >  0,  choose  A  >  0  so  that  |  |f— BAf|  |<c,  where  the 
norm  is  the  uniform  norm  over  the  interval  I.  Now,  let  the  two  functions 

B=f(x)3j  ej«d v*(a),  xeR  , 

be  approximated  using  either  of  the  methods  of  this  paper  by  the  two  finite  ex¬ 
ponential  sums,  say,  E  -  (x),  so  that 

l|B*f  -  E  - 1 1  <£• 

Let  E(x)  =  E  +  (x)  -  E*(x)  .  Since  BAf  =  B^f  -  Bj[  f  ,  we  have 

Ilf  -  EU  =  IKf-Bjf)  +  (B;f-E  +  )-(B-f-E-)|| 

<  1 1 f-BA  f||  +  ||B;f-E  +  ||  +  ||B-f-E-||<3£  . 

Thus,  exponential  sums  of  degree  not  greater  than  deg(E  +  )  +  deg(E-)  may  be 
constructed  to  approximate  f(x)  with  specified  accuracy  on  the  interval  I. 

It  is  well  known  [4,  p.  60]  that  if  f(x)  is  measurable,  then  Bochner’s  Theorem 
still  holds  for  almost  all  x,  but  not  necessarily  all  x  as  in  (1.2).  Results  similar  to  the 
results  of  this  report  can  also  be  proven  for  measurable  f  with  careful  attention  to 
certain  details;  however,  this  generalization  is  not  pursued  here.  Similarly, 
Bochner’s  Theorem  has  been  generalized  to  locally  compact  abelian  groups  [4,  p. 
72],  and  perhaps  the  basic  approaches  to  approximation  used  here  can  be  extended 
to  this  much  more  general  setting. 


15 


-162- 


TR  6357 


We  conclude  by  commenting  that  Bernstein’s  Theorem  [21,  p.  160)  states  that  a 
necessary  and  sufficient  condition  for  f(x)  to  be  completely  monotonic  on  the  in¬ 
terval  (O,00)  is  that 

f(x)  =  f  e-avdV(o),  CKx<«>, 

Jo 

where  V(<r)  is  bounded  and  nondecreasing.  It  is  evident  that  the  methods  employed 
in  this  report  can  be  used  in  a  manner  entirely  analogous  to  the  proof  of  Theorem  1 
to  develop  exponential  approximations.  That  is  to  say,  whenever  f(x)  can  be  ex¬ 
pressed  as 

f(x)  =  J  e-"  dV(a) 

for  some  finite  A  >  0,  there  exist  approximations  of  the  form 

Tn(x)  =  f(0)X  a^e-*^ 

k  =  I 

where 


ak>0,k  =  l,...,n 

(4.1) 

a,  +  .  .  .  +  an  =  1 

(4.2) 

A>ak>0,k  =  l,...,n 

(4.3) 

|  f(x)  -  Tn(x)|  «  2f(0)  e_Ax/2t  0  <  x  <  oo  . 

(4.4) 

An  alternate  approach  for  approximation  of  completely  monotonic  functions  can 
be  found  in  [22]. 


16 


-163- 


TR  6357 


V.  References 

1.  J.  D.  Bruce  and  H.  Unz,  “Mechanical  Quadratures  to  Synthesize 
Nonuniformly-Spaced  Antenna  Arrays,”  Proc.  of  the  IRE  ( Correspondence ),  vol. 
50,  Oct.  1962,  p.  2128. 

2.  A.  L.  Maffett,  “Array  Factors  with  Nonuniform  Spacing  Parameter,”  IRE 
Trans,  on  Ant.  and  Prop.,  vol.  AP-10,  March  1962,  pp.  131-136. 

3.  J.  Stewart,  “Positive  Definite  Functions  and  Generalizations,  an  Historical 
Survey,”  Rocky  Mountain  J.  Math.,  vol.  6,  No.  3,  1976,  pp  409-434. 

4.  R.  R.  Goldberg,  Fourier  Transforms,  Cambridge  University  Press,  1970. 

5.  R.  Schabach,  “Suboptimal  Exponential  Approximations,”  SIAM  J.  Numer. 
Anal.,  vol.  16,  no.  6,  December  1979,  pp.  1007-1018. 

6.  D.  W.  Kammler  and  R.  J.  McGlinn,  “A  Bibliography  for  Approximation 
With  Exponential  Sums,”  J.  Comp,  and  App.  Math.,  vol.  4,  No.  2,  1978,  pp.  167- 
173. 

7.  N.  I.  Achieser,  Theory  of  Approximation,  Ungar  Publishing  Co.,  1956. 

8.  G.  J.  van  derMaas,  “A  Simplified  Calculation  of  Dolph  Tchebycheff  arrays,” 
J.  of  Applied  Physics,  vol.  25,  January  1954,  pp.  121-124. 

9.  V.  Barcilon  and  G.  C.  Temes,  “Optimum  Impulse  Response  and  the  van  der 
Maas  Function,”  IEEE  Trans,  on  Circuit  Theory,  vol.  CT-19,  July  1972,  pp.  336- 
342. 

10.  R.  J.  Duffin  and  A.  C.  Schaeffer,  “Some  Properties  of  Functions  of  Ex¬ 
ponential  Type,”  Bull.  Amer.  Math.  Soc.,  vol.  44,  1938,  pp.  236-240. 

11.  A.  Erdelyi,  Editor,  Tables  of  Integral  Transforms,  vol.  1,  Bateman 
Manuscript  Project,  McGraw-Hill,  1954. 

12.  G.  Szego,  Orthogonal  Polynomials,  American  Mathematical  Society 
Colloquium  Publications,  vol.  23,  Fourth  Edition,  1978. 

13.  G.  Meinardus,  Approximation  of  Functions:  Theory  and  Numerical 
Methods,  Springer-Verlag,  1967. 

14.  F.  B.  Hildebrand,  Introduction  to  Numerical  Analysis,  First  Edition, 
McGraw-Hill,  1956. 

15.  R.  W.  Hamming,  Numerical  Methods  for  Scientists  and  Engineers,  First 
Edition,  McGraw-Hill,  1962. 

16.  W.  Gautschi,  “Construction  of  Gauss-Christoffel  Quadrature  Formulas,” 
Math.  Comp.,  vol.  22,  1968,  pp.  251-270. 

17.  W.  Gautschi,  “On  the  Construction  of  Gaussian  Quadrature  Rules  From 
Modified  Moments,”  Math.  Comp.,  vol.  24,  1970,  pp.  245-260. 

18.  R.  A.  Sack  and  A.  F.  Donovan,  “An  Algorithm  for  Gaussian  Quadrature 
Given  Modified  Moments,”  Numer.  Math.,  vol.  18,  1972,  pp.  465-478. 


17 


-164- 


TR  6357 


19.  E.  Parzen,  Modern  Probability  Theory  and  Its  Applications ,  John  Wiley  and 
Sons,  Inc.,  1960. 

20.  G.  Polya  and  G.  Szego,  Problems  and  Theorems  in  Analysis  /,  English 
Edition,  Springer-Verlag,  1972. 

21.  D.  V.  Widder,  The  Laplace  Transform,  Princeton  University  Press,  1946. 

22.  D.  W.  Kammler,  “Prony’s  Method  for  Completely  Monotonic  Functions,” 
J.  of  Math.  Anal,  and  Appl.,  vol.  57,  1977,  pp.  560-570. 


18 


-165- 


FREQUENCY  LINE  DETECTOR/TRACKERS 


Foreword 


Hidden  Markov  models  (HMMs)  are  well  known  for  their  application  to  automatic 
speech  recognition  problems,  where  they  are  used  to  characterize  the  time  variation  of  short 
term  Fourier  spectra  of  the  broad  band  speech  signal.  HMMs  are  useful  outside  the  speech 
application  as  well.  The  application  of  HMMs  to  the  problem  of  detecting  and  tracking  time 
varying  frequency  lines  is  presented  in  detail  in  paper  [13].  One  unique  aspect  of  this 
approach  is  that  tracks  are  automatically  initiated  and  terminated  as  an  intrinsic  function  of 
the  underlying  HMM  algorithm  For  this  reason,  the  tracker  is  referred  to  as  an  HMM 
detector/tracker.  Another  interesting  aspect  is  that  the  finite  state  HMM  enables  the  non- 
Gaussian  nature  of  the  measurement  process  to  be  modeled  exactly.  Papers  [14]  and  [15] 
are  the  first  to  present  the  application  of  HMMs  to  frequency  line  detection  and  tracking. 

Papers  [16]  and  [17]  describe  extensions  of  the  HMM  detector/tracker  presented  in 
[13]  to  include  the  exploitation  of  the  phase  and  amplitude  information  in  the  received 
signal.  The  inclusion  of  this  information  affects  the  state  conditional  measurement 
likelihood  functions,  but  it  does  not  alter  the  fundamental  character  of  the  HMM 
detector/tracker  algorithm. 

The  ability  of  HMM  detector/trackers  to  estimate  signal-to-noise  ratio  (SNR)  is 
documented  in  [18].  The  SNR  estimation  algorithm  is  a  maximum  likelihood  algorithm, 
and  it  is  derived  from  a  variation  of  the  Baum- Welch  training  algorithm  for  general  HMMs. 

Detection  performance  can  be  studied  by  using  an  HMM  as  a  signal  source  model, 
either  matched  or  mismatched  to  the  HMM  of  the  detector/tracker.  The  detection  capability 
of  HMM  detector/trackers  was  first  studied  in  this  way  in  [19].  Paper  [20]  also  discusses 
the  use  of  HMMs  as  signal  sources. 

Various  advanced  and  special  purpose  computer  architectures  have  been  proposed 
for  implementation  of  HMMs  for  the  speech  application.  Paper  [21]  studies  HMM 
detector/tracker  implementations  on  the  Connection  Machine  supercomputer. 


-167- 


Frequency  Line  Tracking 
Using  Hidden  Markov  Models 


R.  L.  Streit  and  R.  F.  Barrett 


-169- 


586 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  38.  NO  4.  APRIL  1990 


Frequency  Line  Tracking  Using  Hidden  Markov 

Models 

ROY  L.  STREIT,  senior  member,  ieee,  and  ROSS  F.  BARRETT 


Abstract— This  paper  demonstrates  how  the  problem  of  frequency 
line  tracking  can  be  formulated  in  terms  of  hidden  Markov  models 
(HMM’s).  Frequency  cells  comprising  a  subset,  or  gate,  of  the  spectral 
bins  from  FFT  processing  are  identified  with  the  states  of  the  hidden 
Markov  chain.  An  additional  aero  state  is  included  to  allow  for  the 
possibility  of  track  initiation  and  termination.  Analytic  expressions  are 
obtained  for  the  basic  parameters  of  the  HMM  in  terms  of  physically 
meaningful  quantities,  and  optimization  of  the  HMM  tracker  is  care¬ 
fully  discussed.  A  measurement  sequence  based  on  a  simple  threshold 
detector  forms  the  input  to  the  tracker.  The  outputs  of  the  HMM 
tracker  are  a  discrete  Viterbi  track,  a  gate  occupancy  probability  func¬ 
tion,  and  a  continuous  mean  cell  occupancy  track.  The  latter  provides 
an  estimate  of  the  mean  signal  frequency  as  a  function  of  time.  The 
performance  of  the  HMM  tracker  is  evaluated  for  two  sets  of  simulated 
data  and  is  found  to  be  remarkably  good,  comparing  favorably  to  re¬ 
sults  from  inspection  of  the  signal  spectrograms.  A  comparison  of  the 
HMM  tracker  to  earlier,  related  trackers  is  presented,  and  possible 
extensions  are  discussed. 


I.  Introduction 

HE  estimation  of  the  frequency  of  isolated  tones 
embedded  in  a  noise  background  is  a  problem  that  is 
of  interest  in  diverse  fields  (e  g.,  seismology,  radar, 
sonar,  radioastronomy,  etc.),  and  is  currently  receiving 
considerable  attention  in  the  signal  processing  literature 
(e.g.,  see  [l]-[4]).  In  the  case  where  the  frequency  of  the 
tone  is  changing  as  a  function  of  time,  a  related  problem 
is  that  of  accurately  tracking  these  changes  in  frequency. 

One  obvious  approach  is  to  divide  the  time  series  into 
finite-sized  blocks,  and  to  apply  one  of  the  many  new  fre¬ 
quency  estimation  techniques  to  the  data  in  each  block. 
The  result  is  a  sequence  of  independent  frequency  esti¬ 
mates  which,  if  the  signal-to-noise  ratio  (SNR)  is  reason¬ 
ably  high,  provides  an  accurate  estimate  of  the  underlying 
frequency  variations.  However,  as  the  SNR  is  reduced, 
the  scatter  in  the  frequency  estimates  becomes  large,  and 
“outliers,”  or  estimates  far  from  the  true  frequency  track, 
become  common.  A  priori  knowledge  of  the  extent  and 
rapidity  of  the  likely  frequency  changes  can  be  incorpo¬ 
rated  into  an  algorithm  that  rejects  the  highly  improbable 
outliers  and  produces  smoothed  frequency  estimates  as  a 

Manuscript  received  August  II,  1988;  revised  March  28,  1989. 

R.  L.  Streit  is  with  the  Naval  Underwater  Systems  Center,  New  Lon¬ 
don,  CT  06320,  on  leave  at  the  Weapons  Systems  Research  Laboratory, 
Maritime  Systems  Division,  Defence  Science  and  Technology  Organisa¬ 
tion,  Salisbury,  South  Australia,  Australia. 

R.  F.  Barrett  is  with  the  Weapons  Systems  Research  Laboratory,  Mar¬ 
itime  Systems  Division,  Defence  Science  and  Technology  Organisation, 
Salisbury.  South  Australia.  Australia. 

IEEE  Log  Number  9034283. 


function  of  time.  Such  an  algorithm  is  designated  here  a 
“frequency  tracker.” 

The  purpose  of  this  paper  is  to  show  that  the  problem 
of  frequency  tracking  lends  itself  readily  to  formulation 
in  terms  of  a  hidden  Markov  model  (HMM).  These 
models  are  used  in  speech  applications  to  characterize  the 
time  variation  of  the  short-term  spectra  of  spoken  words. 
The  basic  principles  of  HMM’s  are  reviewed  in  Section 
II.  For  more  detailed  discussions  of  HMM’s,  the  reader 
is  referred  to  [5],  [6]  and  to  the  references  cited  therein. 

The  HMM’s  utilized  in  this  paper  are  comprised  of  two 
basic  parts:  a  Markov  chain,  and  a  set  of  discrete  finite- 
outcome  random  variables.  The  Markov  chain  has  a  finite 
number  of  states  and  is  characterized  by  its  transition 
probability  matrix  A.  The  elements  of  the  A  matrix  are  the 
probabilities  of  transitioning  between  the  states  of  the 
Markov  chain.  The  set  of  random  variables  is  character¬ 
ized  by  a  measurement  probability  matrix  B.  The  ele¬ 
ments  of  the  B  matrix  are  the  probabilities  defining  the 
probability  density  functions  (pdf’s)  of  the  finite-outcome 
random  variables.  Each  state  of  the  Markov  chain  is 
uniquely  associated  with  one  of  the  random  variables. 

The  relevance  of  the  HMM  to  frequency  tracking  is  easy 
to  see.  The  range  of  frequencies  over  which  the  track  is 
allowed  to  wander  is  divided  into  a  finite  number  of  fre¬ 
quency  cells,  and  each  cell  is  associated  with  a  state  of 
the  Markov  chain.  In  addition,  a  zero  state  is  included  to 
allow  for  the  possibility  of  the  track  wandering  outside 
the  allowed  frequency  range  or  terminating  altogether. 
The  A  matrix  represents  our  knowledge,  based  on  past 
experience,  of  the  likely  extent  of  the  frequency  fluctua¬ 
tions,  or  of  the  track  terminating,  or  of  it  restarting  after 
a  previous  termination.  The  inclusion  of  the  zero  state  is 
an  important  feature,  and  its  presence  precludes  a  simple 
characterization  of  track  variation  as  a  Gaussian  statistic. 

The  B  matrix  characterizes  the  connection  between  the 
underlying  state  at  time  t  and  the  measurement  at  time  t. 
For  the  HMM  frequency  tracker  presented  in  this  paper, 
the  measurement  takes  the  form  of  a  detection.  A  detec¬ 
tion  is  said  to  have  occurred  in  a  particular  frequency  cell 
at  time  t  if  the  spectral  power  in  that  cell  at  that  time  ex¬ 
ceeds  a  certain  threshold  D  and  is  larger  than  the  power 
in  all  other  cells  within  the  allowed  frequency  range.  If 
the  power  in  each  cell  is  less  than  D,  a  detection  in  the 
zero  state  is  said  to  have  occurred.  Since  the  B  matrix 
connects  the  underlying  states  with  the  noisy  measure¬ 
ments,  it  depends  on  the  SNR  and  the  nature  of  the  back- 


-171- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM’S 

ground  noise.  For  the  HMM  tracker  presented  here,  the 
B  matrix  is  computed  analytically.  The  threshold  detector 
applied  in  this  way  results  in  a  measurement  sequence  that 
is  highly  non-Gaussian  in  character. 

Once  the  connection  between  the  HMM  and  the  fre¬ 
quency  tracking  problem  is  correctly  formulated,  the  wide 
body  of  existing  knowledge  on  HMM’s  is  exploited  to 
yield  both  discrete  and  continuous  tracker  outputs.  The 
highly  efficient  Viterbi  algorithm  is  used  to  obtain  the 
maximum  likelihood  frequency  track,  conditioned  on  a 
given  set  of  measurements.  We  refer  to  this  track  as  the 
Viterbi  track;  it  is  the  discrete  output  of  the  HMM  tracker. 
The  forward-backward  algorithm  is  used  to  compute  the 
mean  cell  occupancy  track  and  the  probability  that  no  fre¬ 
quency  track  is  present;  both  are  conditioned  on  the  mea¬ 
surement  sequence.  These  are  the  continuous  outputs  of 
the  HMM  tracker. 

The  HMM  tracker  presented  in  this  paper  minimizes 
the  effect  of  noise  by  looking  at  the  overall  track  time 
history  for  global  structure.  In  practice,  only  a  fixed  length 
T of  the  measured  track  is  utilized;  thus,  T  measurements 
are  stored  before  the  output  HMM  track  is  calculated.  It 
does  not  follow,  however,  that  the  HMM  tracker  is  a  fixed 
lag  tracker  with  lag  T  —  1 .  It  is  shown  in  Section  III-C 
that  the  HMM  tracker  can  be  used  as  a  fixed  lag  tracker 
with  any  lag  from  0  to  T  —  1 . 

Section  III  describes  the  HMM  frequency  tracking  al¬ 
gorithm  in  detail,  and  discusses  how  the  HMM  tracker 
can  be  optimized  for  a  particular  application.  The  perfor¬ 
mance  of  the  HMM  tracker  on  simulated  data  is  discussed 
in  Section  IV.  Section  V  compares  the  HMM  tracker  and 
related  work  by  Kopec  [7]  for  formant  tracking  using 
HMM's  in  the  field  of  speech  processing.  Two  earlier  fre¬ 
quency  trackers  described  by  Scharf  et  al.  [8]  and  Jaffer 
et  al.  [9]  are  also  based  on  HMM’s,  although  it  was  not 
recognized  at  the  time,  and  they  are  also  discussed  in  Sec¬ 
tion  V.  Possible  extensions  of  the  HMM  tracker  are  dis¬ 
cussed  in  Section  VI.  The  conclusions  of  the  paper  are 
presented  in  Section  VII. 

II.  Elements  of  Hidden  Markov  Models 

A  finite  Markov  chain  has  a  finite  number  n  +  1  of 
states  where  n  0,  and  is  characterized  by  its  transition 
probability  matrix,  denoted  A  =  [a(J]  where  i,j  =  0,  1, 
•••,«.  Let  x  be  the  initial  state  probability  vector  of  the 
Markov  chain.  Thus,  at  the  start  time  t  =  1,  the  proba¬ 
bility  that  the  Markov  chain  is  in  state  i  is  x,.  The  prob¬ 
ability  that  the  chain  transitions  from  state  i  at  time  t  to 
state  j  at  time  t  +  1  is  atJ  where  t  =  1 ,  2,  3,  •  •  •  .  Note 
that  the  transition  probabilities  a are  independent  of  time 
t.  Markov  chains  with  an  infinite  number  of  states  are  not 
considered. 

The  tracking  application  presented  in  this  paper  re¬ 
quires  only  HMM’s  with  a  finite  number  of  different  pos¬ 
sible  outcomes  or  measurements.  However,  we  also  con¬ 
sider  HMM’s  with  measurements  that  are  arbitrary 
complex-valued  vectors  because  this  kind  of  HMM  is  use¬ 
ful  in  some  applications  (see  below  in  Section  V).  The 


587 

pdf  of  the  random  variable  uniquely  associated  with  state 
i  of  the  Markov  chain  is  denoted  by  b,(z)  where  z  is  a 
measurement.  Let  B  denote  the  vector  [b,(z)\-  For  finite- 
outcome  HMM’s,  we  also  use  B  to  denote  the  measure¬ 
ment  probability  matrix  B  =  [by]  where  btJ  =  b,(Zj)  and 
Zj  runs  through  the  finite  measurement  set.  This  abuse  of 
notation  should  not  cause  confusion.  (What  we  call  a 
measurement  is  referred  to  as  a  “symbol”  in  the  speech 
literature  [5],  [6].) 

Simulation  of  an  HMM  measurement  sequence  of 
length  T,  given  x,  A,  and  B,  is  straightforward.  The  initial 
state  of  the  Markov  chain  is  chosen  according  to  the  initial 
state  probability  vector  x.  The  initial  state  uniquely  de¬ 
termines  the  first  pdf.  The  first  measurement,  say  z(l), 
is  chosen  according  to  this  first  pdf.  Next,  the  Markov 
chain  transitions  to  another  (or  the  same)  state  according 
to  its  transition  probability  matrix  A.  This  state  deter¬ 
mines  the  second  pdf,  and  the  second  measurement,  say 
z(2),  is  chosen  according  to  this  second  pdf.  Continuing 
in  this  fashion  up  to  time  t  =  T generates  the  measurement 
sequence 

Zr=  {z(l),z(2),  •••  ,z(7-)}.  (2.1) 

It  is  important  to  note  that  the  only  output  from  such  an 
HMM  simulator  is  the  measurement  sequence  ZT.  The 
state  sequence  of  the  Markov  chain  is  not  an  output. 

From  the  HMM  simulation  procedure,  it  is  clear  that 
the  total  probability  of  a  given  measurement  sequence  is 
the  sum 

P[ZT]  =  ZP[ZT\l]  (2.2) 

where  /  =  {/(l),  /( 2  ),•••,  /(T)}  denotes  an  arbitrary 
Markov  chain  state  sequence  of  length  T,  and  P[ZT\I] 
denotes  the  probability  of  ZT,  conditioned  on  knowledge 
of  the  state  sequence.  Explicitly,  we  have  (consider  the 
simulation  procedure) 

=  { TF/r  i  )^/(  i  >[z(  0]}  {°/(l)./(2)^/(2)[z(2)]} 

{a/(T-l)./(T)^/(T)[z(7’)]}-  (2.3) 

As  is  intuitively  clear  from  the  HMM  simulation  proce¬ 
dure,  some  state  sequences  /  are,  in  general,  more  likely 
than  other  state  sequences  to  correspond  to  the  given  ZT. 
In  other  words,  some  terms  in  the  summation  (2.2)  are 
larger  than  others.  Since  the  “true”  state  sequence  cor¬ 
responding  to  ZT  is  not  observable,  we  define  an  optimal¬ 
ity  criterion  and  use  it  to  select  an  “optimal”  state  se¬ 
quence.  We  define  an  optimal  state  sequence  to  be  any 
state  sequence  for  which  the  probability  P[ZT\l]  is  a 
maximum.  Optimal  state  sequences  are  not  necessarily 
unique  because  the  maximum  P[ZT\I]  may  not  be 
uniquely  attained;  however,  the  HMM’s  developed  in  this 
paper  appear  to  give  unique  optimal  state  sequences,  ex¬ 
cept  in  situations  which  are  of  little  or  no  importance  in 
the  application.  Other  definitions  of  optimality  are  possi¬ 
ble  and  potentially  useful  (see  [5]),  but  only  the  definition 
above  is  used  in  this  paper. 


-172- 


588 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  38.  NO.  4.  APRIL  1990 


Wc  refer  to  an  optimal  state  sequence  corresponding  to 
a  given  ZT as  the  Viterbi  track,  denoted  Iy[ZT\,  and  to  the 
probability 

Pv[2t\  =  max  P[Zr|/]  (2.4) 

as  the  Viterbi  score  of  ZT.  Both  the  Viterbi  track  and  the 
Viterbi  score  are  easily  computed  using  the  so-called  Vi¬ 
terbi  dynamic  programming  algorithm.  The  Viterbi  algo¬ 
rithm  gives  the  globally  optimal  state  sequence  as  re¬ 
quired  by  (2.4).  Furthermore,  the  computational 
complexity  of  the  Viterbi  algorithm  is  linear  in  T,  the 
length  of  the  measurement  sequence.  This  makes  it  a  very 
efficient  algorithm  in  many  applications. 

For  a  given  observation  sequence  (2.1),  the  Viterbi  al¬ 
gorithm  is  defined  as  follows.  For  t  =  1,  define 

5, (y)  =  In  Tj  +  In  bj(z(  1 )) ,  0  <  j  <,  n 

'l'i(j)  ~  arbitrary  (2.5) 

and,  for  t  =  2,  3,  •  •  •  ,  T,  define 

6, (j)  =  \nbj(z{t))  +  max  {$,_,(*')  +  In a0} 

0  S  IS  n 

(2.6a) 

Mi)  =  argmax  {«,_,(/)  +  lnatf} 

OSl'Sn  (  2 .  OD  ) 

where  the  argmax  function  gives  the  smallest  index  i  for 
which  the  maximum  is  attained.  The  Viterbi  score  is 

Pv[ZT )  =  max  {Mv)}  (2.7) 

0  S  j  4  It 

and  the  Viterbi  track  is  given  by 

lv[ZT]  =  (Ml),  M2),  •••  ,ly{T)} 

where 

IV(T)  =  argmax  {fij-O')}  (2. 8a) 

Osjsn 

and  for  /  =  T  -  1 ,  T  -  2,  •  •  •  ,  1 , 

M0  =  <M  i(M'+i))-  (2.8b) 

For  greater  computational  efficiency,  the  natural  loga¬ 
rithms  of  the  components  of  x,  A,  and  B  are  usually  pre¬ 
computed  and  stored  for  finite-outcome  HMM’s. 

The  so-called  forward-backward  algorithm  is  used  to 
provide  state  occupancy  information  on  the  hidden  Mar¬ 
kov  chain  state  sequence.  We  define  the  forward  proba¬ 
bilities  a,(j)  by 

ttfO')  =  /’Ml).  •  •  •  .  z(f)  and  I(t)  =  j], 

IsisT  (2.9a) 

and  the  backward  probabilities  0,(  j )  by 
&,U)  =  P[z{'+  l),z(t  +  2),  •••  ,z(r)|/(f)-yj, 
lsisf-1.  (2.9b) 


The  probabilities  a,{  j )  are  calculated  with  the  recursion 

»i  (j)  =  *)A(z(l)) 

It 

=  M*('))  .2  ,(i  )aijt  t  =  2,  •  •  •  ,  T 

(2.10a) 

and  the  probabilities  0 ,(j)  are  calculated  with  the  recur¬ 
sion 

Brti)  =  1 

It 

AO-)  =  ZajMzit  +  1))  A +  ,(«')•  (2.10b) 

We  define  the  state  occupancy  probabilities  at  time  r  by 
y,(i)  =  a,(i)  0,(i)/P[ZT)  (2.11) 

so  that 

S  7,(0  =  1. 

,-o 

The  state  occupancy  probability  y,(i )  is  interpreted  as  the 
probability  that  the  Markov  chain  occupies  state  «'  at  time 
t,  conditioned  on  the  measurement  sequence.  We  shall  see 
in  Section  III  how  the  state  occupancy  probabilities  are 
used  to  define  the  continuous  output  of  the  HMM  fre¬ 
quency  tracker. 

The  computational  complexity  of  the  Viterbi  algorithm 
is  (n  +  1  )2T  additions  if  the  natural  logarithm  of  the  com¬ 
ponents  of  the  B  matrix  can  be  stored.  If  the  measurement 
pdf  vector  B  must  be  computed  for  each  symbol  in  ZT, 
then  the  complexity  is  [(n  +  1  f  +  c,]  T additions  where 
c,  is  the  complexity  (measured  in  units  equivalent  to  ad¬ 
dition)  of  computing  the  natural  logarithm  of  the  compo¬ 
nents  of  the  vector  B  for  an  arbitrary  measurement. 

The  computational  complexity  of  the  forward-back¬ 
ward  algorithm  is  (n  +  1  yT  multiplications  if  the  B  ma¬ 
trix  is  stored  and  [(/»  +  1  )2  +  c2]  T  multiplications  if  the 
measurement  pdf  vector  B  is  computed  for  each  measure¬ 
ment  in  ZT  where  c2  is  the  complexity  (measured  in  units 
equivalent  to  multiplication)  of  computing  the  compo¬ 
nents  of  the  vector  B  for  an  arbitrary  measurement.  The 
imperative  need  to  rescale  to  prevent  underflow  (dis¬ 
cussed  in  (6))  requires  an  additional  (n  +  1 )  T  divisions. 

An  important  concept  in  the  application  of  HMM's  is 
“training."  In  the  case  when  many  different  measurement 
sequences  are  known  for  the  same  HMM,  maximum  like¬ 
lihood  estimates  of  the  model  parameters  x,  A,  and  B  can 
be  computed  using  the  so-called  Baum-Welch  rcestima- 
tion  algorithm.  Training  is  not  used  for  the  frequency 
tracker  presented  in  this  paper.  Further  background  infor¬ 
mation  on  HMM’s,  including  a  discussion  of  training,  is 
found  in  [5],  [6],  [10]  and  in  the  references  cited  therein. 

III.  Frequency  Tracking  with  Hidden  Markov 
Models 

A  permissible  frequency  track  is  defined  to  be  any  state 
sequence  that  is  a  realization  of  the  Markov  chain  char- 


-173- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM'S 

acterized  by  r  and  the  A  matrix.  The  states  representing 
frequency  cells  are  numbered  from  1  to  n  and  are  referred 
to  as  the  nonzero  states.  The  collection  of  nonzero  states 
is  called  the  gate,  and  the  gate  size  is  said  to  be  n.  The 
unique  state  representing  the  absence  of  the  frequency 
track  is  numbered  0  and  is  referred  to  as  the  zero  state. 
Track  initiation  and  track  termination  are  defined  to  occur 
whenever  the  state  sequence  transitions  out  of  and  into  the 
zero  state,  respectively.  The  SNR  does  not  affect  the  tran¬ 
sition  probabilities  between  the  nonzero  states  because 
these  transitions  are  related  only  to  possible  frequency 
track  variations  inside  the  gate;  however,  the  SNR  does 
influence  track  initiation  and  termination.  An  important 
issue  in  optimizing  the  performance  of  the  HMM  tracker 
is  the  definition  of  the  row  and  column  of  the  A  matrix 
corresponding  to  transitions  out  of  and  into  the  zero  state. 
This  issue  is  discussed  in  Section  III-D. 

The  frequency  track  is  not  directly  observable,  except 
at  infinite  SNR,  and  is  inferred  from  measured  data.  The 
measurements  are  random  functions  of  the  frequency 
state,  and  the  pdf’s  of  these  random  functions  constitute 
the  B  matrix  of  the  HMM  tracker.  As  discussed  in  Section 
I,  a  simple  threshold  detector  is  used  to  estimate  which 
frequency  cell,  if  any,  in  the  gate  is  occupied  by  the  fre¬ 
quency  track.  The  measurements  are,  therefore,  estimates 
of  the  state  of  the  Markov  chain.  By  setting  a  detection 
threshold  D  to  control  false  alarms,  it  is  possible  to  ob¬ 
serve  the  zero  state.  The  size  of  D  depends  on  the  SNR, 
but  it  also  affects  track  initiation  and  termination.  Forex- 
ample,  if  D  is  too  large,  no  detections  are  made  and  only 
the  zero  state  track  (i.e.,  no  track)  is  measured.  Conse¬ 
quently,  setting  the  detection  threshold  is  an  important 
issue  in  optimizing  the  performance  of  the  HMM  tracker. 
This  issue  is  discussed  in  Section  III-D. 

The  threshold  detector  is  not  the  only  detector  possible, 
but  it  is  one  that  is  commonly  used  in  practice  when  fre¬ 
quency  cells  (usually  FFT  bins)  constitute  the  tracker  in¬ 
put.  Besides  its  simplicity,  our  main  purpose  in  using  the 
threshold  detector  is  to  show  that  the  HMM  tracker  is  ap¬ 
propriate  whenever  it  is  possible  to  estimate  (analytically 
or  otherwise)  the  B  matrix  from  some  underlying  model 
of  the  physical  process.  The  performance  of  the  frequency 
tracker  is  obviously  tied  closely  to  the  particular  detector 
selected. 

The  HMM  tracker  accepts  as  input  a  measured  track 
(i.e.,  a  sequence  of  measured  states)  and  produces  both 
discrete  and  continuous  output  tracks.  The  discrete  output 
is  the  Viterbi  track  or  maximum  likelihood  estimate  of  the 
state  sequence.  The  Viterbi  track  is  necessarily  a  valid 
track,  i.e.,  the  Viterbi  track  is  a  realization  of  the  Markov 
chain  characterized  by  r  and  A.  However,  the  Viterbi 
track  is  not  a  typical  random  walk  because  it  is  condi¬ 
tioned  on  the  measurements.  For  example,  the  Viterbi 
track,  the  true  track,  and  the  measured  track  all  coincide 
for  infinite  SNR.  In  this  paper,  the  HMM  tracker  is  set  up 
so  that  “smooth"  tracks  have  a  higher  likelihood  than 
“rough”  tracks  at  all  finite  SNR’s. 

The  continuous  outputs  of  the  HMM  tracker  are  com- 


589 

prised  of  the  mean  cell  occupancy  (MCO)  track,  denoted 
/M(r),  and  the  gate  occupancy  probability  (GOP)  func¬ 
tion,  denoted  G0(t).  They  are  defined  by 


Gc(r)  =  1  -  7,(0)  =  Z  7,(i) 

i  -  1 

(3.1) 

«i 

/*(f)  =  Z  7,(0//Go(0 

i  »  l 

(3.2) 

where  the  7,(1 )  are  the  state  occupancy  probabilities  given 
by  (2. 1 1)  and  the /  are  the  center  frequencies  of  the  cells. 
Note  that  the  sums  in  (3.1)  and  (3.2)  do  not  include  the 
term  i  =  0;  hence,  the  MCO  track  is  conditioned  on  the 
track  at  no  time  occupying  the  zero  state.  The  MCO  track 
is  continuously  variable  in  the  frequency  range  spanned 
by  the  gate.  The  standard  deviation  oM(r)  associated  with 
the  MCO  track  is  defined  by 

ft 

ej,(f)  =  Z  7, («)[/  -  U')]2/ G0(t).  (3.3) 

I  «  I 

The  MCO  track  is  undefined  whenever  the  GOP  function 
is  identically  zero  because,  in  this  case,  the  track  occupies 
the  zero  state  with  unit  probability,  i.e.,  the  track  has  ter¬ 
minated.  In  practice,  the  MCO  track  should  be  terminated 
whenever  the  GOP  function  is  near  zero.  This  can  be  done 
by  setting  a  threshold  for  G0(r)  or  by  setting  a  threshold 
for  the  MCO  track  standard  deviation  aM(r).  Alterna¬ 
tively,  the  decision  can  be  based  on  the  state  of  the  Viterbi 
track  at  time  t.  Examples  of  the  Viterbi  track,  the  MCO 
track,  and  the  GOP  function  are  given  in  Section  IV. 

Once  a  transition  into  the  zero  state  occurs,  the  track  is 
terminated.  If  the  track  is  reinitiated  in  the  gate  at  some 
later  time,  the  question  arises  as  to  whether  or  not  these 
two  tracks  correspond  to  the  same  frequency  track.  The 
resolution  of  this  question  depends  on  the  particular  ap¬ 
plication  and  on  how  much  additional  information  is 
available.  It  is  outside  the  scope  of  the  present  paper. 

In  the  remainder  of  this  section,  we  discuss  in  turn  the 
detailed  mathematical  structure  of  each  of  the  components 
of  the  HMM  tracker.  The  important  topic  of  optimization 
of  the  tracker  is  fully  discussed  in  Section  III-D. 

A.  Definition  of  the  Transition  Probability  Matrix 
In  terms  of  the  A  matrix,  the  probability  of  track  initi¬ 
ation  into  frequency  cell  j  is  given  by  the  transition  prob¬ 
ability  a0j,  and  the  probability  of  track  termination  out  of 
frequency  cell  j  is  given  by  aj0.  For  the  HMM  tracker 
presented  here,  it  is  assumed  that  the  track  initiation  and 
termination  probabilities,  denoted  u  and  v,  respectively, 
are  independent  of  the  state  j  within  the  gate,  i.e.,  for  j 
=  1,2,  •  •  •  ,  n,  we  have 

ay  =  u/n  (3.4a) 

aj0  =  v.  (3.4b) 

If  the  A  matrix  is  to  be  a  valid  transition  probability  ma¬ 
trix,  each  of  its  rows  must  sum  to  unity.  Thus,  from 


-174- 


i 


590 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  38.  NO  4.  APRIL  IWO 


(3.4a),  we  have 

aoo  =  1  -  u.  (3.4c) 

This  completes  the  definition  of  row  and  column  0  of  the 
A  matrix.  The  best  choice  of  u  and  v  depends  on  the  par¬ 
ticular  application  and,  in  principle,  these  parameters  can 
be  determined  by  training  the  HMM.  Alternatively,  in 
Section  1II-D,  it  is  shown  how  to  choose  u  and  v  to  op¬ 
timize  the  performance  of  the  HMM  tracker  without  train¬ 
ing.  In  this  section,  however,  tz  and  v  are  treated  as  free 
parameters. 

Let  the  z'th  cell  in  the  frequency  domain  be  denoted  by 

[/./  + i].  t  =  1,  2,  •  •  •  ,  n 

where  -oo  <  /,  <  /2  <  •  •  •  </„+,<  +  <».  The  center 
frequency  /  of  the  z'th  cell  is  then  given  by  /  =  (f  + 
/  + 1)/2.  If  the  frequency  track  lies  in  the  ith  cell  at  the 
current  time  step,  the  location  of  the  track  at  the  next  time 
step  is  assumed  to  be  characterized  by  a  Gaussian  distri¬ 
bution  with  mean  /  and  standard  deviation  d  where  d  is  a 
measure  of  the  “process"  noise.  Hence,  the  probability 
that  the  frequency  shifts  from  the  z'th  cell  to  the y'th  cell  at 
the  next  time  step  is  g,j  where 

*0  =  (2*d)-'/2  f  'exp  {-(1/2 )({f-f,)/d)2\df. 

(3.5) 

Note  that  g,j  is  not  a  function  of  the  SNR. 

The  natural  definition  of  the  transition  probabilities  be¬ 
tween  the  nonzero  states  of  the  Markov  chain  is 

•  " 

dy  =  (1  -  v)gi)  /  s  g,k,  i,j  =  1,2,  •  •  •  ,  n. 

(3.6) 

However,  this  definition  results  in  an  “unbalanced”  gate, 
i  .e.,  the  diagonal  elements  &„  are  not  independent  of  t  for 
z  >  0.  If  the  gate  is  sufficiently  unbalanced,  then  in  cer¬ 
tain  instances,  the  Viterbi  track  can  be  skewed  toward  the 
outer  cells  in  the  gate.  A  moment’s  reflection  shows  that 
the  problem  is  caused  by  the  finite  gate  size  and  that  fre¬ 
quency  cells  at  the  edge  of  the  gate  have  the  largest  self- 
transition  probabilities.  The  problem  is  not  significant 
when  the  standard  deviation  d  of  the  process  noise  is  fairly 
small  compared  to  a  cell  width,  but  it  grows  progressively 
more  severe  as  d  gets  larger. 

To  overcome  the  unbalanced  gate  problem,  the  transi¬ 
tion  probabilities  between  the  nonzero  states  of  the  HMM 
tracker  are  derived  from  the  natural  probabilities  d,,  and 
the  termination  probability  v  in  the  following  manner. 
Define 

zZmm  min  dy. 

1  £  I  £  n 

Now,  for  i  >  0,  elements  { a, , ,  •  •  •  ,  a,„ }  of  the  z'th  row 
of  the  A  matrix  are  obtained  from  the  row  vector  £  =  (dM  , 

•  •  •  ,  djB}  as  follows.  Replace  the  “inner”  element  d„ 


with  amin  and  normalize  the  “outer”  n  -  1  elements  so 
that  £  sums  to  1  —  v\  denote  this  vector  by  cy  If  no  ele¬ 
ment  of  C|  exceeds  zami„,  then  stop.  Otherwise,  replace  the 
“inner”  elements  d,  ,_  i,  d„,  and  d,  ,  of  £  with  am,n  and 
normalize  the  “outer”  n  —  3  elements  so  that  £  sums  to 
1  -  i>;  denote  this  vector  by  c2.  If  no  component  of  c2 
exceeds  am,„,  then  stop.  Otherwise,  continue  the  algo¬ 
rithm  until  a  vector,  say  c„  is  found  whose  components 
do  not  exceed  amm.  The  z'th  row  of  the  A  matrix  is  then 
defined  in  partitioned  form  by  ( v ,  c,).  In  this  algorithm, 
if  indexes  i  <  1  or  /  >  n  are  encountered,  the  correspond¬ 
ing  element  is  ignored.  Note  that  the  A  matrix  defined  in 
this  fashion  is  balanced  and  the  transition  probabilities  a {J 
are  as  nearly  equal  to  the  natural  probabilities  as  pos¬ 
sible  without  unbalancing  the  gate. 

B.  Derivation  of  the  Measurement  Probability  Matrix 

The  fact  that  at  a  given  time  the  signal  frequency  lies 
in  a  prescribed  frequency  cell  does  not  necessarily  mean 
that  a  detection  occurs  in  that  cell.  The  presence  of  ran¬ 
dom  noise  can  result  in  some  other  cell  within  the  gate 
fortuitously  recording  greater  spectral  power  than  the 
power  in  the  correct  cell.  Alternatively,  if  no  cell  within 
the  gate  records  a  power  greater  than  the  preset  threshold 
D,  then  a  detection  is  registered  in  the  zero  state.  The  B 
matrix,  therefore,  depends  on  the  background  noise  char¬ 
acteristics,  the  SNR,  and  the  threshold  D.  In  this  section, 
the  SNR  and  the  threshold  D  are  treated  as  free  parame¬ 
ters.  How  they  are  chosen  to  optimize  tracker  perfor¬ 
mance  is  discussed  in  Section  II1-D. 

It  is  assumed  that  the  data  time  series  is  of  the  form 

z(r0  +  kT,)  =  A  sin  [p(r„  +  kT,)  +  £]  +  zit  (3.7) 

where  t0  is  the  initial  time  and  Ts  is  the  sampling  period. 
It  is  also  assumed  that  the  signal  amplitude  A,  phase  £, 
and  angular  frequency  p  remain  constant  over  the  period 
NT„  which  is  the  data  acquisition  time  fora  Fourier  trans¬ 
form  of  size  N.  The  noise  is  taken  to  be  zero-mean  and 
Gaussian  in  nature  so  that 

<  'ik'ij )  =  V2  (3-8) 

where  the  angular  brackets  denote  an  ensemble  average, 
6  denotes  the  Kronecker  delta  function,  and  a2  is  the  var¬ 
iance  of  the  noise. 

The  discrete  Fourier  transform,  denoted  x(<?).  at  an¬ 
gular  frequency  q  of  the  time  series  in  (3.7),  is  given  by 

N-  I 

X(q)  =  (1  /N)  2  z(r0  +  kT,)  exp  (  -jqkTs).  (3.9) 
t-o 

Transforming  the  complex  variable  x(<?)  into  polar  co¬ 
ordinates  gives 

x(?)  =  to'’  (3.10a) 

=  Ce1*  +  De1*  (3.10b) 

where  (/?,  i j)  denotes  the  amplitude  and  phase  of  x  ( <?) 
The  amplitudes  and  phases  of  the  signal  and  noise  com¬ 
ponents  of  x  are  denoted  by  (C,  <t>)  and  ( D,  6),  respec- 


-175- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM'S 


591 


tively.  From  (3.7),  (3.9),  and  (3.10),  it  follows  that 
C  =  {A/2N)  sin  [N(p  -  q)Tj2)/sin  [( p  -  q)Tj 2] 

(3.11a) 

and 

if  ■  (N  ~  \)(p  ~  q)Tj 2  +  pt0  +  |  -  x/2. 

(3.11b) 

The  pdf  of  the  amplitude  R  is  given  by 

P(R)  =  (2RN/o2)I0(2RCN/o2) 

■  exp  [~N(R2  +  C2)/a2)  (3.12) 

where  /0  is  the  modified  Bessel  function.  Note  that  P(R) 
is  a  noncentral  Rayleigh  density  function. 

The  Fourier  transform  in  (3.9)  is  normally  calculated 
only  at  the  discrete  values  qit  i  =  1,  2,  •  •  •  ,  n  where  n 
is  the  gate  size.  For  a  detection  to  be  registered  for  a  par¬ 
ticular  observation  frequency  qm,  two  requirements  must 
be  met.  First,  the  amplitude  Rm  of  the  Fourier  transform 
at  frequency  q„  must  be  larger  than  the  amplitudes  R,*m 
at  all  other  frequencies  q,  „  m  within  the  gate,  and  second, 
the  amplitude  Rm  must  exceed  the  prescribed  threshold  D. 
If  Rj  <  D  for  all  i  within  the  gate,  a  detection  is  registered 
in  the  zero  state  (i.e.,  m  =  0). 

To  simplify  the  problem,  we  assume  the  case  where  the 
discrete  angular  frequencies  q,  are  given  by 

q,  =  2ri/(NTs)  (3.13) 

and  where  the  true  signal  angular  frequency  p  lies  close 
to  one  of  the  q,  (say,  qm),  i.e., 

p  =  2*m/(NTs).  (3.14) 

Substituting  (3.13)  and  (3.14)  into  (3.11a),  we  have  in 
this  case 

C  =  (A/2)Sim.  (3.15) 

Substitution  of  (3.15)  into  (3.12)  leads  to  two  separate 

expressions  for  the  pdf : 

i  =  m;  P,(R,)  =  (2NRi/a2)I0(AR,N/o2) 

■  exp  [  -N(4R2  +  A2)/{Ao2)\ 

(3.16a) 

i  *  m;  P2(R,)  =  (2 NRJa2)  exp  [-NR2 /o2]. 

(3.16b) 

Equation  (3. 16a)  describes  the  pdf  of  R,  in  the  case  when 
the  true  signal  lies  in  the  ith  frequency  cell,  and  (3.16b) 
represents  the  situation  when  no  frequency  is  present  in 
that  cell. 

We  are  now  in  a  position  to  calculate  the  various  ele¬ 
ments  B„,  of  the  B  matrix.  We  consider  first  the  case  where 
there  is  no  signal  present  in  the  frequency  cells  within  the 
gate  (i.e.,  m  =  0).  If  the  amplitudes  measured  in  all  cells 
lie  below  the  threshold  (i.e.,  Rt  <  D  for  all  i ),  then  no 
detection  is  registered.  The  probability  of  this  event  de¬ 


fines  the  B-matrix  element  Boo-  From  (3. 16),  we  have 

n  /*D 

Boo  =  I!  l  P2(r)  dr 
i  =  I  JO 

=  [1  -  exp  (-D:N/a2)]".  (3.17) 

It  is  possible,  however,  that  the  random  noise  contribu¬ 
tion  to  the  time  series  results  in  the  amplitude  in  one  or 
more  of  the  cells  being  larger  than  the  threshold  D.  This 
situation  corresponds  to  a  false  alarm.  The  probability  of 
a  false  alarm  in  the  ith  cell  is  given  by 

Ba  =  (1  -  Boo)/*,  /  =  1.2,  •  •  •  ,n.  (3.18) 

In  the  case  where  a  signal  is  present  in  one  of  the  cells 
within  the  gate  (i.e.,  m  =  1,2,  •  •  •  ,  n),  we  have  three 
disjoint  cases: 

1)  a  detection  is  registered  in  the  correct  cell  (i.e.,  i  = 
m)\ 

2)  a  detection  is  registered  in  an  incorrect  cell  (i.e.,  j 
^  m,  i  *  0); 

3)  no  detection  is  registered  in  any  cell  (i.e.,  /  =  0). 
The  probabilities  Bmm,  Bm  ,  »m,  and  B,„ o  corresponding 

to  these  three  cases  are  given  by  the  expressions 

Bmm  =  L  />l<rM1  “  exp  (-rAZ/a2]"  'dr 

(3.19a) 

So 

P i(r)  dr 

0 

(3.19b) 

=  [1  -  B„o  -  B„m]/(n  -  1).  (3.19c) 

This  completes  the  definition  of  the  B  matrix. 

C.  The  Initial  State  Probability  Vector 
The  final  component  of  the  HMM  tracker  is  the  initial 
state  probability  vector  x.  The  best  choice  for  x  depends 
on  the  application.  For  instance,  when  the  entire  mea¬ 
surement  set  ZT  is  utilized,  as  in  the  examples  in  Section 
IV,  it  is  appropriate  to  choose  x  to  be  independent  of  the 
measurements.  In  this  case,  a  good  strategy  is  to  force 
automatic  track  initiation  by  starting  in  the  zero  state; 
thus,  X;  =  A0l  for  i  =  0,  •  •  •  ,  n.  Alternatively,  x  can  be 
taken  equal  to  the  long-term  state  occupancy  probability 
vector  p  for  the  A  matrix  where  p  is  defined  by  the  matrix 
equation  pA  =  p.  (The  vector  p  exists  uniquely  and  is 
nonnegative  because  the  A  matrix  is  positive.)  In  this  pa¬ 
per,  however,  we  utilize  the  former  choice  because  of  its 
simplicity  and  because  it  makes  the  HMM  tracker  per¬ 
form  like  a  kind  of  detector. 

In  applications  where  measurements  are  taken  of  an  on¬ 
going  track,  it  is  reasonable  to  suppose  that  the  measure¬ 
ment  set  ZT  is  comprised  of  the  most  recent  T  measure¬ 
ments  where  T  is  fixed.  As  indicated  in  the  previous 
paragraph,  for  the  first  data  set,  the  HMM  tracker  should 


-176- 


592 


IF-FE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  58.  NO  4.  APRIL  1990 


use  an  initial  probability  vector  that  is  independent  of  the 
data.  For  subsequent  data  sets,  however,  w  should  be  up¬ 
dated  so  that  it  is  dependent  on  earlier  measurements.  In 
effect,  the  updated  it  characterizes  the  impact  of  track  his¬ 
tory  of  the  HMM  track  estimates  for  the  current  measure¬ 
ment  set.  We  describe  two  updating  methods  that  depend 
on  the  fact  that  time  r  +  1  for  the  previous  measurement 
set  is  identical  to  time  t  for  the  current  set.  The  simplest 
update  assumes  that  the  Viterbi  track  for  the  previous 
measurement  set  is  correct  at  time  /  =  1.  If  /„(  1)  denotes 
the  state  of  the  Viterbi  track  at  time  t  =  1,  then  the  r 
update  for  use  with  the  current  measurement  set  is  taken 
to  be  row  lv{  1 )  of  the  A  matrix.  Alternatively,  the  state 
occupancy  probability  vector  y2 (t)  from  the  previous 
measurement  set  at  time  /  =  2  can  be  used  as  the  ir  update 
for  the  current  measurement  set.  This  method  is  compu¬ 
tationally  less  efficient  than  the  former  method;  however, 
based  on  simulated  data,  it  seems  to  give  slightly  more 
accurate  estimated  MCO  tracks. 

The  HMM  tracker  is  a  fixed  interval  tracker,  i.e. ,  for 
each  input  measurement  sequence  ZE.  the  output  se¬ 
quence  is  an  estimated  track  at  each  time  t  =  1,  2,  •  •  •  , 
T.  When  overlapped  measurement  sets  are  utilized,  as  in 
the  preceding  paragraph,  any  time  TE  in  the  output  HMM 
track  sequence  can  be  chosen  to  correspond  to  the  track 
estimate  for  the  current  measurement  set.  For  example, 
we  define  the  GOP  function  with  lag  TL  =  T  -  TE  by 

Go(t;  Te)  =  1  -  7r,(0). 

Similar  definitions  can  be  made  for  the  MCO  track  and 
the  Viterbi  track.  If  we  choose  TE  =  1 ,  the  HMM  tracker 
functions  as  a  fixed  lag  tracker  with  a  lag  TL  =  T  -  1.  If 
Te  =  7",  the  HMM  tracker  has  no  lag,  i.e.,  TL  =  0.  Based 
on  simulated  data,  it  would  appear  that  the  variance  in  the 
MCO  track  increases  as  TE  increases  from  1  to  T,  how¬ 
ever,  this  subject  is  outside  the  scope  of  the  present  paper 
and  is  not  discussed  further. 

D.  Optimization  of  the  HMM  Tracker 

The  performance  of  the  HMM  tracker  is  completely  de¬ 
termined  by  the  parameters  u,  v,  and  d  characterizing  the 
A  matrix  and  the  parameters  D  and  SNR  characterizing 
the  B  matrix.  However,  it  is  not  intuitively  obvious  how 
to  go  about  setting  reasonable  numerical  values  for  all 
these  parameters.  We  propose  the  following  approach. 
The  process  noise  parameter  d  and  the  SNR  parameter  are 
each  selected  independently  of  the  other  parameters  in  the 
straightforward  manner  discussed  below.  The  remaining 
three  parameters,  however,  are  interdependent  and  are  se¬ 
lected  by  solving  three  nonlinear  equations  in  three  un¬ 
knowns.  One  equation  is  derived  from  an  optimal  detec¬ 
tion  criterion,  and  the  other  two  equations  are  derived 
from  optimal  tracking  criteria.  All  three  optimality  crite¬ 
ria  are  discussed  below. 

The  process  noise  standard  deviation  d  is  similar  to  the 
process  noise  term  in  a  standard  Kalman  filter.  The  smaller 
the  value  d,  the  straighter  the  frequency  track  is  assumed 
to  be;  the  larger  d  becomes,  the  more  the  frequency  track 


looks  like  uniformly  distributed  noise  in  the  gate  (even  at 
infinite  SNR).  For  frequency  line  tracking,  an  estimate  of 
d  can  be  derived  from  an  estimate  of  the  stability  of  the 
line.  If  such  estimates  are  not  available,  the  best  value  to 
use  for  d  can  be  assessed  by  trial  and  error.  Examples  in 
Section  IV  show  the  effect  of  different  values  of  d  on  the 
output  of  the  HMM  tracker. 

The  HMM  tracker  is  a  time-invariant  optimal  tracker  of 
an  intermittent  signal  that  has  a  specified  SNR  whenever 
it  is  present.  We  refer  to  the  specified  SNR  as  the  tracker 
SNR.  If  the  true  SNR  is  greater  than  the  tracker  SNR,  the 
HMM  tracker  may  interpret  genuine  frequency  changes 
in  the  measurement  sequence  as  random  noise  so  that  the 
estimated  tracks  may  be  too  smooth  and  may  persist  after 
the  true  signal  has  terminated.  On  the  other  hand,  if  the 
true  SNR  is  smaller  than  the  tracker  SNR,  the  HMM 
tracker  may  interpret  random  noise  in  the  measurement 
sequence  as  being  incompatible  with  the  assumed  process 
noise,  and  the  net  result  may  be  premature  track  termi¬ 
nation.  Both  effects  may  occur  if  the  true  SNR  is  fluc¬ 
tuating  above  and  below  the  tracker  SNR  over  the  mea¬ 
surement  sequence  ZT.  For  robust  HMM  tracking,  the 
tracker  SNR  should  be  set  somewhat  smaller  than  the  es¬ 
timated  mean  of  the  true  SNR  when  avoidance  of  pre¬ 
mature  termination  is  critical  in  the  application,  and 
greater  than  this  estimate  when  belated  termination  is  more 
important.  Alternatively,  the  tracker  SNR  can  be  set  by 
trial  and  error  just  as  the  process  noise  standard  deviation 
d  is  often  selected.  Examples  in  Section  IV  show  the  ef¬ 
fects  of  different  tracker  SNR  ’s  on  the  HMM  tracker  out¬ 
puts.  It  is  seen  that  the  HMM  tracker  is  reasonably  insen¬ 
sitive  to  SNR  mismatch. 

To  define  an  optimal  detection  criterion  for  the  HMM 
tracker  and  to  set  the  detection  threshold,  we  use  the  long¬ 
term  state  occupancy  probability  vector,  denoted  p  =  (  pq, 
P\,  ■  •  •  ,  p„)  of  the  A  matrix.  We  then  show  that,  for 
optimal  detection,  the  threshold  D  is  a  function  of  the 
other  parameters  defining  the  HMM  tracker.  The  value  of 
D  influences  the  frequency  of  occurrence  of  false  alarms 
and  false  dismissals  in  the  detection  process.  We  define  a 
false  alarm  as  a  measurement  (i.e.,  a  detection)  in  a  non¬ 
zero  state  when  the  zero  state  is  the  true  state.  Thus,  the 
probability  of  a  single  false  alarm  in  state  /  >  0  is  B0l. 
Since  we  have  assumed  that  the  A  matrix  characterizes  the 
frequency  track,  the  long-term  total  false  alarm  probabil¬ 
ity  PEA  is  given  by 

n 

Pfa  =  Po  S  B0,  =  pq(  1  -  Boo)  (3.20) 

I  =  I 

where  the  last  expression  in  (3.20)  follows  because  the  B- 
matrix  rows  sum  to  unity.  Similarly,  a  false  dismissal  is 
defined  as  a  measurement  in  the  zero  state  when  a  nonzero 
state  is  the  true  state.  Thus,  if  the  true  state  is  i  >  0,  the 
probability  of  a  single  false  dismissal  is  B/0,  and  the  long¬ 
term  total  false  dismissal  probability  PED  is  given  by 

n 

P fd  ~  ^  Mi®tO  =  (1  ~  tk>)Bio-  (3.21) 

i » i 


-177- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM'S 


593 


The  last  expression  follows  from  the  fact  that  Bi0  is  in¬ 
dependent  of  the  index  i  for  i  >  0. 

Let  a  and  0  be  nonnegative  numbers  that  add  to  unity 
and  that  represent  the  relative  importance  (in  the  specific 
application)  of  false  alarms  and  false  dismissals.  The  er¬ 
ror  detection  criterion  is  defined  by 

Cbd  =  aP FA  +  &P FD-  (3.22) 

The  optimal  threshold  is  therefore  that  value  of  D  which 
minimizes  CED\  thus,  since  no  is  independent  of  D,  we 
require  D  to  satisfy 

dCED  3fl°°  .  dBi0  ,  . 

~W  =  ~a>H,~dD  +/3(1  '^15  =  0-  (323) 

For  the  A  matrices  used  in  this  paper,  it  is  easy  to  show, 
using  (3.4),  that 

Ho  =  v/(u  +  v).  (3.24) 

Substituting  (3.24)  into  (3.23)  and  then  differentiating 
(3.17)  and  (3.19b),  it  follows  that  the  optimal  threshold 
satisfies  the  nonlinear  equation 

r  t p )  -  lnND  av  i  n  ~  1  0U  8,0 
1  a2  0u  n  av  Bu 

•  exp  (~D2N/o2)  (3.25) 

where  P,(D)  is  given  by  (3.16a).  Equation  (3.25)  can  be 
solved  by  a  variety  of  simple  iteration  procedures,  e.g., 
the  bisection  method.  Obvious  modifications  are  required 
in  (3.25)  if  either  av  -  0  or  0u  =  0. 

An  interesting  special  case  of  (3.25)  occurs  for  a  = 
u/(u  +  v)  and  0  =  v/(u  +  v).  The  optimal  threshold 
is  then  independent  of  u  and  v  and  dependent  only  on  the 
tracker  SNR.  For  these  values  of  a  and  0,  the  criterion 
CED  is  physically  meaningful  because  it  emphasizes  false 
alarms  when  the  frequency  track  is  unlikely  to  be  in  the 
zero  state  (i.e.,  mo  is  small)  and  emphasizes  false  dis¬ 
missals  when  the  track  is  unlikely  to  be  in  the  gate  (i.e., 
Ho  is  large).  This  choice  of  a  and  0  is  not  necessarily  the 
best  choice  for  the  specific  application,  however,  and  we 
do  not  pursue  the  matter  further  here. 

As  the  tracker  SNR  goes  to  infinity,  the  ratio  Bl0 / 5„ 
goes  to  zero.  Neglecting  this  ratio  in  (3.25)  gives  the  sim¬ 
pler  approximate  expression 

P\(D)  =  (2nND/o2)(av/0u)  exp  [  -D2N/o2] . 

(3.26) 

This  expression  must  be  solved  iteratively  for  D,  but  it 
does  not  require  the  evaluation  of  integrals  as  does  (3.25). 
The  threshold  satisfying  (3.26)  is  independent  of  the 
tracker  SNR,  and  dependent  on  u  and  v. 

To  obtain  a  threshold  from  (3.25),  we  must  first  specify 
u  and  v.  We  use  the  GOP  function  to  define  optimal 
tracker  initiation  and  termination  criteria.  We  then  show 
that,  for  optimal  tracking,  the  parameters  u  and  v  are 
functions  of  the  other  parameters  defining  the  HMM 
tracker.  An  exemplar  tracker  initiation  measurement  se¬ 


quence  is  defined  by 

z,  -  [(«  +  0/2]. 

for  t=  [T/2]  -  [L,/ 2]  +  1, 

•••  ,  [T/2)  -  [Lb/ 2]  +  Lb 
z,  =  0,  otherwise  (3.27) 

where  [x]  denotes  the  greatest  integer  less  than  or  equal 
to  jc.  This  exemplar  sequence  is  comprised  of  midgate 
measurements  of  duration  LB  in  the  center  of  a  string  of 
zero  state  measurements.  Let  G0B(t)  denote  the  GOP 
function  corresponding  to  (3.27),  with  the  HMM  tracker 
started  in  the  zero  state.  Intuitively,  G0B(t)  =  0  at  time 
t  =  1,  rises  monotonically  to  some  maximum  value  at 
time  I  =  T/2,  and  thereafter  decreases  monotonically  to 
time  I  =  T.  The  tracker  initiation  criterion  is  defined  by 

C0B  =  max  GOB(0  (3.28) 

Isis  T 

and  its  optimal  value  is  defined  by  COB  =  1  /2.  With  this 
optimality  criterion,  the  HMM  tracker  gives  a  50%  prob¬ 
ability  that  the  exemplar  sequence  (3.27)  is  identified  as 
a  track  by  the  GOP.  In  other  words,  sequences  of  mea¬ 
surements  in  the  gate  of  duration  less  than  LB  are  treated 
by  the  tracker  as  likely  false  alarms,  and  sequences  of 
duration  greater  than  LB  are  treated  as  likely  new  tracks. 

Similarly,  an  exemplar  tracker  termination  measure¬ 
ment  sequence  is  defined  by 

Z|  ~  0, 

for  r  »  [T/2]  -  [LE/2]  +  1, 

•  •  •  ,  [T/2]  -  [Lg/ 2]  +  Le 
Z,  =  [(«  +  l)/2],  otherwise.  (3.29) 

This  exemplar  sequence  experiences  a  “drop  out’’  of  du¬ 
ration  LE  in  the  center  of  a  midgate  measurement  se¬ 
quence.  Let  G0E(t)  denote  the  GOP  function  correspond¬ 
ing  to  (3.29),  with  the  HMM  tracker  started  in  state  [(n 
+  l)/2).  Intuitively,  G0E(t )  =  1  at  time  t  =  0,  falls 
monotonically  to  some  minimum  value  at  time  t  =  T/2, 
and  thereafter  increases  monotonically  to  time  t  =  T.  The 
tracker  termination  criterion  is  defined  by 

Coe  =  min  Gozi1)  (3.30) 

isisr 

and  its  optimal  value  is  defined  by  C0E  =  1/2.  HMM 
trackers  satisfying  this  optimality  criterion  give  a  50% 
probability  that  the  exemplar  sequence  (3.29)  is  not  con¬ 
sidered  to  be  a  track  by  the  GOP.  Thus,  sequences  of  zero 
state  measurements  of  duration  less  than  LE  are  treated  by 
the  tracker  as  likely  false  dismissals,  and  sequences  of 
duration  greater  than  LE  are  treated  as  likely  terminated 
tracks. 

Optimal  values  of  u  and  v  are  determined  by  first  se¬ 
lecting  durations  LB  and  LE  that  ace  suitable  for  the  spe- 


-178- 


594 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH,  AND  SIGNAL  PROCESSING.  VOL.  38.  NO.  4.  APRIL  1940 


cific  application,  and  then  solving  the  three  nonlinear 
equations  (3.25),  C0B  =  1/2,  and  C0E  =  1/2  simulta¬ 
neously  for/),  u,  and  v.  This  set  is  equivalent  to  a  system 
in  only  u  and  v  because  D  is  given  as  a  function  of  u  and 
v  by  (3.25)  and  because  d  and  the  tracker  SNR  are  already 
specified.  C0B  and  C0E  are  readily  computed  for  any  given 
pair  of  («,  v)  values  using  the  HMM  tracker.  Conse¬ 
quently,  straightforward  iteration  procedures  can  be  used 
to  find  optimal  values  for  u  and  v.  In  practice,  we  proceed 
by  using  the  HMM  tracker  to  compute  the  right-hand  sides 
of  (3.28)  and  (3.30)  for  a  small  grid  of  (w,  v )  pairs  after 
first  computing  D  using  (3.25)  at  each  gridpoint.  The  grid 
is  adjusted  by  inspection  until  near  optimal  values  of  Cob 
and  C0e  are  found. 

IV.  Application  of  HMM  Frequency  Tracker  to 
Simulated  Data 

The  performance  of  the  HMM  frequency  tracker  is 
evaluated  by  application  to  simulated  data.  The  advantage 
of  simulated  data  over  real  data  is  that  the  underlying, 

‘hidden”  state  sequence  in  this  case  is  precisely  known, 
thus  enabling  the  objective  assessment  of  the  tracker  per¬ 
formance. 

For  the  purposes  of  evaluation,  two  sets  of  simulated 
data  are  generated.  Each  set  consists  of  a  frequency-mod¬ 
ulated  sine  wave  added  to  white  Gaussian  noise.  The  fre¬ 
quency  excursions  of  the  modulation  spans  five  of  the  fre¬ 
quency  cells  employed  in  the  Markov  model.  In  the 
examples  considered  here,  the  gate  size  is  set  to  nine,  and 
the  gate  is  centered  about  the  mean  signal  frequency.  The 
total  number  of  states  in  the  Markov  chain  is  thus  ten, 
counting  the  zero  state. 

The  frequency  modulation  characteristics  for  the  two 
data  sets  are  displayed  in  Fig.  1.  The  y  axis  is  divided 
into  the  nine  discrete  frequency  cells  employed  in  the 
HMM,  and  the  x  axis  is  divided  into  100  time  steps.  The 
true  signal  frequency  as  a  function  ot  time  is  indicated  by 
the  cell  occupancy  track.  For  Fig.  1(a),  the  signal  is  pre¬ 
sent  throughout  the  total  time  period,  while  in  Fig.  1(b), 
it  is  present  only  intermittently. 

Intensity-modulated  “spectrograms”  for  the  two  data 
sets  are  presented  in  Fig.  2.  The  signals  of  Fig.  1  are 
added  to  white  noise  and  the  power  spectra  of  the  resultant 
sums  are  then  calculated.  The  power  spectra  are  shown  as 
a  function  of  time  in  the  spectrograms  of  Fig.  2.  The  SNR 
values  in  Fig.  2(a)  and  (b)  are  -23  and  -20  dB,  respec¬ 
tively.  The  underlying,  hidden  state  sequence  is  difficult 
to  discern  in  Fig.  2. 

The  various  outputs  of  the  HMM  tracker  for  the  contin¬ 
uous  signal  frequency  are  displayed,  together  with  the 
measurement  sequence,  in  Fig.  3.  The  measurement  se¬ 
quence,  which  indicates  the  cell  containing  the  maximum 
spectral  power  if  the  power  exceeds  a  prescribed  thresh¬ 
old  or  the  zero  state  if  the  power  in  no  cell  exceeds  the 
threshold,  is  shown  in  Fig.  3(c).  The  zero  state  is  shown 
slightly  displaced  from  the  other  frequency  cells,  and  in 
this  example  remains  unoccupied  at  all  times.  The  mea¬ 
surement  sequence  forms  the  only  data  input  to  the  HMM 
tracker. 


Fig.  1.  The  continuous  (a)  and  intermittent  (b)  true  signals  used  for  the 
investigation  of  the  HMM  tracker.  The  frequency  cells  are  marked  along 
the  y  axis  and  the  lime  divisions  along  the  x  axis. 


Fig.  2.  Intensity-modulated  'spectrograms"  for  the  signals  of  Fig.  l  .The 
signals  are  embedded  in  white  Gaussian  noise.  SNR  values  of  -23.0 
and  -20.0  dB  are  used  for  (a)  and  (b),  respectively. 


Fig.  3.  Results  of  applying  the  HMM  tracker  to  the  data  of  Fig.  2(a).  Pa¬ 
rameters  employed  are:  tracker  SNR  =  -23  dB.  d  =  0.333,  u  =  0.24, 
v  -  0.016,  D  m  0.0278.  (a)  The  output  of  the  MCO  tracker;  the  shaded 
area  contains  all  the  tracks  within  one  standard  deviation  of  the  mean 
track,  (b)  The  probability  of  zero  state  occupancy  [  i.e.,  7,(0)  -  1  - 
G0(r)]  as  a  function  of  time,  (c)  The  measurement  sequence;  the  cells 
corresponding  to  the  zero  state  are  shown  underneath  the  cells  of  the 
gate,  (d)  The  Viterbi  track;  the  cells  corresponding  to  the  zero  state  are 
shown  underneath  the  cells  of  the  gate. 


The  MCO  track  is  shown  in  Fig.  3(a).  The  shaded  area 
marks  the  bounds  lying  one  standard  deviation  estimate 
aM(t)  on  either  side  of  the  optimal  track.  The  MCO  track 
does  not  directly  estimate  the  possibilities  of  track  initia¬ 
tion  and  termination.  However,  track  initiation  and  ter¬ 
mination  can  be  included  by  defining  a  threshold  on  the 
MCO  track  standard  deviation  or  by  calculating  the  GOP 
function  and  setting  an  appropriate  threshold  there.  The 
probability  of  zero  state  occupancy,  i.e.,  1  -  G0(t),  is 
shown  in  Fig.  3(b).  This  probability  starts  high  because 
the  HMM  tracker  is  initialized  in  the  zero  state. 

The  Viterbi  track  for  this  data  set  is  displayed  in  Fig. 


-179- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM'S 


595 


3(d),  and  is  seen  to  provide  an  excellent  reconstruction  of 
the  true  state  sequence  of  Fig.  1(a).  The  Viterbi  track  does 
not  initiate  until  three  time  steps  after  the  start  of  the  data 
sequence.  The  delay  is  explained  by  the  initiation  of  the 
HMM  tracker  in  the  zero  state  and  by  the  large  fluctua¬ 
tions  in  the  measurement  sequence  at  the  start  of  the  se¬ 
quence.  This  delay  is  consistent  with  the  high  probability 
of  zero  state  occupancy  in  Fig.  3(b)  and  the  large  variance 
in  the  MCO  track  over  the  corresponding  period.  The  pa¬ 
rameters  used  in  the  HMM  tracker  for  the  continuous  sig¬ 
nal  frequency  are  listed  in  the  caption  of  Fig.  3. 

The  results  from  the  application  of  the  HMM  tracker 
for  the  intermittent  signal  frequency  [Fig.  1(b)]  are  dis¬ 
played  in  Fig.  4,  and  Fig.  4(a)-(d)  correspond  to  the  anal¬ 
ogous  results  shown  in  Fig.  3(a)-(d)  for  the  continuous 
signal.  For  the  intermittent  signal,  the  Viterbi  track  is  seen 
once  again  to  provide  an  excellent  reconstruction  of  the 
true  state  sequence  of  Fig.  1(b).  In  this  case,  the  capabil¬ 
ity  of  the  Viterbi  track  to  terminate  and  initiate  as  the  sig¬ 
nal  drops  out  and  reappears  is  clearly  demonstrated.  From 
Fig.  4(a),  it  is  seen  that  the  MCO  track  follows  the  true 
state  sequence  closely  during  the  periods  when  the  signal 
is  present,  and  exhibits  a  large  variance  when  the  signal 
is  absent.  Similarly,  the  probability  of  zero  state  occu¬ 
pancy  is  also  large  when  the  signal  is  absent.  The  setting 
of  a  threshold  on  the  MCO  track  standard  deviation  or  on 
the  probability  of  zero  state  occupancy  could  be  used  to 
implement  track  termination  and  initiation  for  the  MCO 
track.  The  results  would  then  agree  closely  with  those  ob¬ 
tained  from  the  Viterbi  track.  The  HMM  parameters  used 
for  these  examples  are  listed  in  the  caption  to  Fig.  4. 

As  we  have  indicated  earlier,  for  optimal  performance 
of  the  HMM  tracker,  the  parameters  of  the  HMM  shoul^ 
represent  as  closely  as  possible  the  characteristics  of  the 
line  being  tracked.  The  process  noise  parameter  d  is  a 
measure  of  the  likelihood  of  the  track  changing  fre¬ 
quency.  In  Fig.  5,  the  Viterbi  track  for  the  continuous 
signal  is  presented  as  the  value  of  the  process  noise  pa¬ 
rameter  d  is  decreased  from  0.667  to  0.167.  For  the  high¬ 
est  value  of  d,  the  track  tends  to  follow  the  measurement 
sequence  too  closely,  with  the  result  that  the  Viterbi  track 
exhibits  a  fine  structure  that  is  not  present  in  the  true  state 
sequence.  On  the  other  hand,  when  d  is  small,  the  tracker 
fits  the  measurement  sequence  by  a  series  of  straight  line 
segments,  separated  by  track  terminations  and  reinitia¬ 
tions.  In  this  case,  the  tracker  finds  it  less  “costly”  to 
terminate  and  reinitiate  the  track  than  to  allow  the  track 
to  step  to  an  adjacent  frequency  cell.  We  refer  to  this  phe¬ 
nomenon  of  termination  and  reinitiation  as  “punctua¬ 
tion.”  The  value  of  0.333  for  d  seems  to  provide  the  op¬ 
timal  track  for  this  data  set. 

In  Fig.  6,  the  effect  of  varying  the  tracker  SNR  is  in¬ 
vestigated  for  the  intermittent  signal  of  Fig.  1(b).  In  this 
case,  the  SNR  of  the  true  signal  is  -20  dB,  and  the  Vi¬ 
terbi  track  exhibits  optimal  results  when  the  tracker  SNR 
is  equal  to  the  true  SNR.  If  the  tracker  SNR  is  less  than 
the  true  SNR  [e.g.,  Fig.  6(a)],  the  tracker  is  less  likely  to 
terminate,  even  when  the  true  signal  is  absent.  If  the 
tracker  SNR  is  considerably  larger  than  the  true  SNR,  the 


Fig.  4.  As  for  Fig.  3,  but  for  ihe  data  of  Fig.  2(b).  Parameters  employed 
are:  tracker  SNR  =  -20  dB,  d  =  0.333.  u  =  0.3.  v  =  0.0092,  D  - 
0.0386. 


Fig.  5.  The  Viterbi  track  for  the  data  of  Fig.  2(a).  showing  the  effects  of 
varying  the  process  noise  d.  Values  of  d  are:  (a)  0  667,  (b)  0.333.  (c) 
0.222,  (d)  0.167.  Other  parameters  are  the  same  as  in  Fig.  3. 


tracker  is  likely  to  terminate,  even  when  the  signal  is  still 
present  (as  the  HMM  was  designed  for  a  stronger  signal), 
and  is  more  likely  to  follow  the  measurement  sequence 
too  closely  (as  the  HMM  attaches  an  overimportance  to 
each  measurement).  These  two  modes  of  behavior  can  be 
seen  in  Fig.  6(d). 

The  dependence  of  the  probability  of  zero  state  occu¬ 
pancy  on  the  tracker  SNR  exhibits  a  similar  range  of  be¬ 
havior  to  that  of  the  Viterbi  track.  In  Fig.  7(a)-(d),  the 
probabilities  of  zero  state  occupancy  are  shown  for  the 
same  range  of  HMM  parameters  as  was  used  in  Fig.  6(a)- 
(d).  For  a  tracker  SNR  of  -25.2  dB.  the  probability  of 
zero  state  occupancy  is  always  less  than  0.5  (except  near 
the  start  of  the  data  sequence)  and  the  Viterbi  track  ter¬ 
minates  only  once.  This  termination  is  a  punctuation 
caused  by  the  necessity  of  reducing  the  cost  of  making  a 


-180- 


596 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  38.  NO  4.  APRIL  1990 


Fig  6.  The  Viterbi  tracks  for  the  data  of  Fig.  2(b),  showing  the  effects  of 
varying  the  tracker  SNR  Values  of  the  tracker  SNR  art:  (a)  -25.2  dB, 
(b)  -23.0  dB.  (c)  -20.0  dB.  (d)  -  17.0  dB. 


(b) 


(c) 


V  ^  \  . 

\  A/  '-A 

n  7 

■  j 

L.  J  .. 

_ a_ 

Fig.  7.  The  probability  of  zero  state  occupancy  for  the  data  of  Fig.  2(b), 
showing  the  effects  of  varying  the  tracker  SNR.  The  parameter  values 
are  the  same  as  in  Fig.  6.  The  zero  state  occupancies  associated  with  the 
Viterbi  tracker  are  also  shown  for  comparison. 


spurious  five-cell  frequency  change.  As  the  tracker  SNR 
increases,  so  does  the  probability  of  zero  state  occupancy, 
until  in  the  region  where  the  tracker  SNR  is  considerably 
larger  than  the  true  SNR,  this  probability  becomes  exces¬ 
sively  high,  and  spurious  terminations  occur  in  the  Viterbi 
track. 

From  the  data  shown  in  Figs.  1-7,  it  is  apparent  that 
both  the  Viterbi  and  MCO  tracks  provide  very  good  re¬ 
constructions  of  the  hidden,  true  signal  behavior.  The  two 
tracks  are  mutually  consistent.  Track  termination  and  ini¬ 
tiation  capabilities  are  intrinsic  to  the  Viterbi  track  and 
can  be  built  into  the  MCO  track  Figs.  6-7  indicate  that 


optimal  performance  of  the  trackers  is  dependent  on  the 
tracker  parameters  being  suitably  matched  to  the  signals 
under  investigation.  However,  it  has  been  our  experience 
that  even  in  a  case  where  some  mismatch  of  parameters 
is  unavoidable  (e  g.,  the  tracking  of  real  signals),  the  two 
tracks  still  provide  remarkably  good  reconstructions  of  the 
underlying  signal,  and  agree  consistently  with  .what  would 
be  obtained  from  careful  inspection  of  the  original  spec¬ 
trograms. 

V.  Comparisons  to  Related  Trackers 

A.  Formant  Tracking 

Kopec  [7]  studies  the  problem  of  tracking  formants  in 
speech  using  HMM’s.  Formant  tracking  is  similar  to  fre¬ 
quency  line  tracking,  and  Kopec’s  paper  and  this  paper 
have  much  in  common.  Both  papers  use  finite-state,  fi¬ 
nite-outcome  HMM’s,  and  both  use  the  same  kinds  of 
states.  Moreover,  Kopec  uses  a  “distinguished  nonnu- 
merical  state’’  to  represent  the  absence  of  a  formant,  just 
as  we  have  used  the  zero  state  to  indicate  the  absence  of 
a  track  in  the  gate.  The  different  applications,  however, 
require  different  definitions  of  the  measurement  sequence. 
Kopec  uses  the  codebook  vectors  that  result  from  vector 
quantization,  whereas  we  use  the  frequency  state  esti¬ 
mates  of  a  threshold  detector. 

A  significant  difference  between  Kopec’s  paper  and  our 
paper  is  that  he  uses  the  MCO  track,  but  not  the  Viterbi 
track.  One  reason  he  rejects  the  Viterbi  track  is  that  his 
frequency  ceils  are  from  50  to  100  Hz  wide,  and  so  any 
discrete  track  estimate  is  inherently  unacceptable.  In  our 
application,  however,  the  frequency  cells  can  be  made  as 
small  as  desired  by  increasing  the  FFT  resolution.  An¬ 
other  reason  he  does  not  use  the  Viterbi  track  is  that  it 
does  not  provide  a  way  to  define  or  set  a  formant  detection 
threshold.  In  our  application,  we  are  able  to  define  the 
HMM  tracker  so  that  there  is  excellent  correlation  be¬ 
tween  zero  state  occupancy  for  the  Viterbi  track  and  low 
values  of  the  GOP  function;  therefore,  there  is  no  neces¬ 
sity  to  set  a  threshold  on  the  GOP  to  determine  the  pres¬ 
ence  or  absence  of  the  track.  Nonetheless,  in  practice,  we 
would  still  set  such  a  threshold  for  the  GOP. 

B.  Relationship  with  a  Dynamic  Programming  Tracking 
Method 

In  a  set  of  earlier  papers  (e  g.,  see  [8]),  Scharf  et  al. 
present  a  dynamic  programming  method  for  tracking  fre¬ 
quency  and  phase.  Although  they  do  not  identify  it  as 
such,  their  algorithm  is  equivalent  to  an  HMM  using  real¬ 
valued  continuous  measurement  vectors.  They  assume  the 
frequency  is  constant  for  the  duration  of  each  block  of 
time  series  data,  and  then  allow  a  transition  to  another  of 
a  set  of  discrete  frequencies.  These  discrete  frequencies 
correspond  to  the  states  of  the  underlying  HMM.  They  do 
not,  however,  include  a  zero  state  to  indicate  the  absence 
of  a  signal.  They  use  the  Viterbi  track  exlcusively  and  do 
not  discuss  the  MCO  track. 

The  fundamental  equation  of  Scharf  et  al.  is  the  loga- 


-181- 


STREIT  AND  BARRETT:  FREQUENCY  LINE  TRACKING  USING  HMM'S 

rithmic  likelihood  function,  denoted  LLF,  which  is  given 
by 

r  t 

LLF  =-2(1  /2a?)  | z,  -  s,|2  +  2  In  p(x,  I x,_ ,). 

fa  1  t =  1 

(5.1) 

Here,  {xt,  •  •  •  ,  xT  }  denotes  a  sequence  of  discrete  fre¬ 
quency  states,  the  vector  z,  is  a  block  of  time  series  data 
commencing  at  time  t,  the  vector  s ,  is  the  complex  expo¬ 
nential  signal  vector  corresponding  to  a  frequency  x,,  and 
a,  is  the  standard  deviation  of  the  random  background 
noise  at  time  t.  The  measurements  in  the  HMM  of  Scharf 
et  al.  are  the  observed  time  series  data  blocks;  thus,  the 
first  term  in  the  LLF  is  equivalent  to  the  B  matrix  in  the 
HMM,  and  the  second  term  in  the  LLF  is  equivalent  to 
the  A  matrix  of  the  HMM.  The  dynamic  programming 
algorithm  that  Scharf  et  al.  present  for  maximizing  (5.1) 
is  equivalent  to  the  Viterbi  algorithm  (2.5)-(2.6). 

C.  Bayes-Markov  Tracking 

Jaffer  et  al.  [9]  present  a  recursive  Bayesian  technique 
for  tracking  dynamic  signals  in  noise.  Although  they  do 
not  present  it  as  an  HMM,  their  technique  is  equivalent 
to  a  finite-state  HMM  using  real-valued  continuous  mea¬ 
surement  vectors.  It  uses  a  one  time-step  recursion  to  up¬ 
date  the  posterior  pdf  of  the  state  conditioned  on  the  mea¬ 
sured  data,  but  it  does  not  treat  sequences  of 
measurements  collectively.  The  MCO  track,  if  used  with 
zero  lag,  is  similar  to  the  tracker  of  Jaffer  et  al. 

Jaffer  et  al.  define  the  states  of  their  model  to  be  FFT 
resolution  cells  just  as  we  have  done,  but  they  do  not  de¬ 
fine  a  zero  state  to  denote  the  absence  of  a  signal.  A  mea¬ 
surement  is  a  (real)  vector  whose  components  are  the 
magnitudes  of  the  output  FFT’s  in  the  resolution  cells. 
Let  P[X,  =  j\Z,]  denote  the  posterior  pdf  of  the  signal 
location  in  cell  j  conditioned  on  the  entire  measurement 
sequence  Z,.  The  fundamental  equation  of  their  method 
(neglecting  scale  factors)  is 

n 

P[X,  =  j\Z,]  =  b(z,\X,=j)  2  a(JP[X,^  =  » | Z, _ , ] 

l  =  I 

(5.2) 

where  b(z,  \  X,  =  j )  is  the  pdf  of  the  current  measurement 
vector  z,  conditioned  on  state  j.  Using  Bayes’  theorem, 
they  show  that 

b(z,\X,  =  j)  =/s+v(z,(j))//v(z,(j)) 

where /S  +  N(z,(;))  and/N(z,(y))  are  the  signal-plus- 
noise  and  noise  only  pdf’s,  respectively,  of  the  data  in 
cell  j.  They  define  the  initial  state  probability  vector  to  be 
uniform.  Bayes’  theorem  can  be  used  to  show  that  (5.2) 
is  identical  to  one  step  of  the  HMM  forward  algorithm 
(see  (2.10a)). 

Jaffer  et  al.  do  not  explicitly  define  the  A  matrix,  but 
the  kind  of  matrix  they  have  in  mind  is  clear  from  the 
context  and  from  the  two  interesting  examples  they  pre¬ 
sent.  (One  example  is  pulsed  radar  tracking  of  range  and 


597 

Doppler,  and  the  other  is  passive  sonar  tracking  of  Dopp¬ 
ler  and  delay.)  They  also  give  a  detection  statistic  that 
they  claim  enhances  signal  detection  despite  target  mo¬ 
tion,  but  do  not  present  examples  of  its  use  or  discuss  the 
false  alarm  rates  that  might  be  anticipated. 

VI.  Possible  Extensions  of  the  HMM  Tracker 

It  is  seen  in  Section  IV  that  the  HMM  tracker  is  an 
excellent  algorithm  for  frequency  line  tracking,  provided 
that  the  underlying  HMM  is  optimized  for  the  line  under 
study.  A  number  of  extensions  to  the  present  work  that 
could  enhance  the  performance  of  the  HMM  tracker  even 
further  are  now  discussed. 

In  the  application  of  HMM’s  to  speech  processing,  the 
training  of  HMM’s  is  a  well-established  concept.  Train¬ 
ing  is  mentioned  briefly  in  Section  II,  but  it  is  unnecessary 
for  the  analyses  carried  out  here  for  simulated  data.  With 
real  data,  however,  training  the  HMM  will  enable  the  de¬ 
termination  of  the  optimal  values  of  the  A  matrix,  B  ma¬ 
trix,  and  7r  for  the  line  being  considered.  These  parame¬ 
ters  are  likely  to  depend  on  the  SNR,  and  on  the  nature 
and  amplitude  of  the  frequency  modulation  of  the  line. 
Suitable  training  of  the  HMM  should  result  in  better  fre¬ 
quency  tracking  for  real  data. 

Extending  the  present  tracker  to  include  the  possibility 
of  more  than  one  line  being  present  in  the  frequency  gate 
would  enable  the  tracking  of  lines  whose  frequencies  are 
close  together,  and  would  include  the  possibility  of  fre¬ 
quency  tracks  crossing.  One  implementation  of  such  an 
extension  is  to  allow  multiple  detections  when  the  spectral 
power  in  more  than  one  frequency  cell  lies  above  the  de¬ 
tection  threshold  D.  Each  state  is  then  no  longer  uniquely 
associated  with  a  frequency  cell  or  the  zero  state,  but  de¬ 
scribes  one  of  the  following  possibilities:  1)  no  detection 
in  any  frequency  cell,  2)  a  detection  in  only  one  frequency 
cell,  or  3)  detections  in  two  (or  more)  frequency  cells. 
The  A  and  B  matrices  would  have  to  be  reformulated  in  a 
manner  consistent  with  this  interpretation. 

The  concept  of  an  HMM  state  can  also  be  extended  to 
incorporate  both  the  frequency  and  its  time  derivative.  The 
advantage  of  such  an  extension  is  that  the  dynamic  char¬ 
acteristics  of  the  line  tracks  can  be  more  completely  rep¬ 
resented  by  the  A  matrix,  with  a  consequent  improvement 
in  the  tracker  performance;  the  disadvantage  is  that  the 
number  of  states  is  increased  by  a  factor  equal  to  the  num¬ 
ber  of  time  derivative  resolution  cells,  with  a  consequent 
increase  in  the  required  computing  time.  The  proposed 
extension  would  enable  a  more  meaningful  comparison  of 
the  HMM  tracker  to  existing  alpha-beta  and  Kalman 
trackers  (see,  e.g.,  [1 1]— [13])  which  typically  use  a  track 
derivative  model. 

The  examples  presented  in  Section  IV  are  investigated 
using  a  finite-time  window  of  length  T  =  100.  The  pos¬ 
sibility  of  sliding  windows  is  discussed  in  Section  III-C. 
For  frequency  tracks  that  change  substantially  in  fre¬ 
quency  over  a  long  time,  it  would  clearly  be  computa¬ 
tionally  advantageous  to  employ  smaller  windows  and  an 
“adaptive"  gate  whose  center  frequency  and  width 


-182- 


598 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  38.  NO  4.  APRIL  1990 


change  as  the  window  slides  over  the  data  sequence.  The 
use  of  an  adaptive  gate,  combined  with  the  treatment  of 
multiple  tracks  within  a  gate,  would  significantly  extend 
the  range  of  applicability  of  the  HMM  tracker. 

Finally,  it  is  emphasized  that  the  data  input  to  the  cur¬ 
rent  HMM  tracker  is  in  the  form  of  a  measurement  se¬ 
quence  obtained  from  a  simple  threshold  detector.  The 
tracker  is  therefore  denied  access  *t»  important  frequency 
estimation  information,  e.g.,  the  amplitude  and  phase  of 
the  complex  FFT’s  associated  with  each  frequency  cell. 
Extensions  that  allow  for  measurement  sequences  of  a 
more  sophisticated  nature  than  the  output  of  a  simple 
threshold  detector  clearly  warrant  further  investigation.  A 
possible  alternative  measurement  sequence  for  input  to  the 
tracker  could  be  the  instantaneous  frequencies  obtained 
from  a  Wigner-Ville  time-frequency  analysis  of  the  data. 

VII.  Conclusions 

In  this  paper,  the  application  of  HMM’s  to  the  problem 
of  frequency  tracking  is  presented  and  discussed,  and  the 
interpretation  of  several  earlier  tracking  algorithms  in 
terms  of  HMM’s  is  pointed  out  for  the  first  time.  It  is 
demonstrated  in  Section  III  how  to  formulate  the  fre¬ 
quency  tracking  problem  in  terms  of  HMM’s  and  how  to 
optimize  the  HMM  parameters  for  this  problem.  In  Sec¬ 
tion  IV,  the  HMM  tracker  is  tested  by  application  to  sim¬ 
ulated  data.  The  resultant  tracks  are  comparable  to  those 
obtained  by  inspection  of  the  spectrograms,  and  are  found 
to  be  robust  with  respect  to  variations  in  the  HMM  param¬ 
eters. 

Tracker  optimization  is  important  and  is  responsible  for 
the  observed  consistency  between  the  Viterbi  and  the 
MCO  tracks,  and  for  the  reliable  initiation  and  termina¬ 
tion  of  the  tracks.  We  propose  three  different  track  initi¬ 
ation  and  termination  criteria  based  on  the  Viterbi  track, 
the  standard  deviation  of  the  MCO  track,  and  the  GOP 
function.  The  mutual  consistency  of  the  results  obtained 
by  using  these  criteria  is  a  testimony  to  the  successful  op¬ 
timization  of  the  HMM  parameters. 

The  present  work  is  capable  of  extension  in  several  di¬ 
rections.  A  number  of  possible  extensions  are  outlined  in 
Section  VI.  These  issues  are  the  subject  of  current  inves¬ 
tigation,  and  will  be  addressed  in  subsequent  publica¬ 
tions. 

References 

[1]  D.  C.  Rife  and  R.  R.  Boorslyn,  “Single-tone  parameter  estimation 
from  discrete-lime  observations,”  IEEE  Trans.  Inform.  Theory ,  vol. 
IT-20,  pp.  591-598,  Sept.  1974. 

(2)  S.  M.  Kay  and  S.  L.  Marple,  Jr.,  “Spectrum  analysis— A  modem 
perspective.”  Proc.  IEEE .  vol.  69,  pp.  1380-1419,  Nov.  1981. 

{3j  D  R.  A.  McMahon  and  R.  F.  Barrett,  “An  efficient  method  for  es¬ 
timation  of  the  frequency  of  a  single  tone  in  noise  from  the  phases  of 
discrete  Fourier  transforms,”  Signal  Processing ,  vol.  II,  pp  169- 
177,  Sept.  1986. 

(4)  R.  F.  Barrett  and  D.  R.  A.  McMahon,  “Comparison  of  frequency 


estimators  for  underwater  acoustic  data.”  J.  Acoust.  Soc.  Amer.,  vol. 
79.  pp.  1461-1471,  May  1986. 

[5]  L  R.  Rabiner  and  B.  H.  Juang,  “An  introduction  to  hidden  Markov 
models,”  IEEE  ASSP  Mag.,  vol.  3,  pp.  4-16,  Jan.  1986. 

[6]  S.  E.  Levinson,  L.  R.  Rabiner,  and  M.  M.  Sondhi,  “An  introduction 
to  the  application  of  the  theory  of  probabilistic  functions  of  a  Markov 
process  to  automatic  speech  recognition.”  Bell  Syst.  Tech.  J.,  vol. 
62,  pp.  1035-1074,  Apr.  1983. 

[7]  G.  E.  Kopec,  “Formant  tracking  using  hidden  Markov  models  and 
vector  quantization,”  IEEE  Trans.  Acousi.,  Speech,  Signal  Process - 
ing,  vol.  ASSP-34.  pp.  709-729,  Aug.  1986. 

[8]  L.  L.  Scharf  and  H.  Elliott.  “Aspects  of  dynamic  programming  in 
signal  and  image  processing,”  IEEE  Trans.  Automat.  Contr.,  vol. 
AC-26,  pp.  1018-1029,  Oct.  1981. 

[9]  A.  G.  Jaffer,  R.  L.  Stoutenborough,  and  W.  B.  Green,  “Improved 
detection  and  tracking  of  dynamic  signals  by  Bayes-Markov  tech¬ 
niques,”  in  Proc.  ICASSP’83 ,  vol.  2,  pp.  575-578. 

U0]  L.  E.  Baum,  T.  Petrie,  G.  Soules,  and  N.  Wiess.  “A  maximization 
technique  occurring  in  the  statistical  analysis  of  probabilistic  func¬ 
tions  of  Markov  chains,”  Ann.  Math.  Statist. ,  vol.  41.  no.  1,  pp. 
164-171,  1970. 

[11]  T.  R.  Benedict  and  G.  W.  Bordner,  “Synthesis  of  an  optimal  set  of 
radar  track-while-scan  smoothing  equations,”  IEEE  Trans.  Automai. 
Contr.,  vol.  AC-7,  pp.  27-32,  July  1962. 

[12]  Y.  Bar-Shalom  and  T.  E.  Fortmann,  Tracking  and  Data  Association. 
New  York:  Academic,  1988. 

[13]  R.  F.  Barrett,  A.  K.  Steele,  and  R.  L.  Streit,  “Frequency  line  track¬ 
ing  algorithms,”  in  Proc.  NATO  Adv.  Study  Inst.  Underwater  Acous- 
tic  Data  Processing,  Kingston,  Ont.,  Canada.  July  1988. 


Roy  L.  Streit  (SM’84)  was  bom  in  Guthrie,  OK, 
on  October  14,  1947.  He  received  the  B.A.  degree 
(with  Honors)  in  mathematics  and  physics  from 
East  Texas  State  University,  Commerce,  in  1968, 
the  M.A.  degree  in  mathematics  from  the  Univer¬ 
sity  of  Missouri,  Columbia,  in  1970,  and  the 
Ph.D.  degree  in  mathematics  from  the  University 
of  Rhode  Island,  Kingston,  in  1978. 

He  was  a  Visiting  Scholar  in  the  Department  of 
Operations  Research,  Stanford  University,  Stan¬ 
ford,  CA,  during  1981-1982,  and  an  Exchange 
Scientist  in  the  Signal  Processing  and  Classification  Group  at  the  Defence 
Science  and  Technology  Organisation.  Adelaide,  South  Australia,  from 
1987  to  1989.  He  joined  the  staff  of  the  Naval  Underwater  Systems  Center 
(then  the  Navy  Underwater  Sound  Laboratory),  New  London,  CT,  in  1970. 
He  is  an  Applied  Mathematician  and  has  published  work  in  several  areas, 
including  towed  array  design,  complex  function  approximation,  semi-in¬ 
finite  programming,  and  applications  of  hidden  Markov  models.  His  cur¬ 
rent  interests  include  image  analysis,  tracking  problems,  and  training  al¬ 
gorithms  for  neural  networks. 


Ross  F.  Barrett  was  bom  in  Fremantle,  Western 
Australia,  on  August  19,  1942.  He  received  the 
B.Sc.  (Honours)  degree  in  physics  in  1964  and  the 
Ph.D.  degree  in  physics  in  1969,  both  from  the 
University  of  Western  Australia. 

He  was  an  Alexander  Von  Humboldt  Fellow  at 
the  University  of  Frankfurt,  West  Germany  ( 1969- 
1972),  a  Research  Fellow  at  the  University  of 
Melbourne  (1972-1974).  a  Research  Fellow  at  the 
Australian  National  University.  Canberra  ( 1974— 
1978),  and  a  Lecturer  in  Physics  at  the  University 
of  Western  Australia  (1978-1982).  Until  this  period,  his  primary  research 
interests  were  in  theoretical  and  experimental  nuclear  physics.  In  1982  he 
joined  the  staff  of  the  Defence  Science  and  Technology  Organisation,  Sal¬ 
isbury,  South  Australia,  where  he  is  presently  a  Principal  Research  Sci¬ 
entist.  His  current  research  interests  are  in  signal  processing  techniques  and 
passive  sonar  classification. 


-183- 


Nonlinear  Frequency  Line  Tracking  Algorithms 


A.  K.  Steele,  R.  L.  Streit  and  R.  F.  BaiTett 


-185- 


Nonlinear  Frequency  Line  Tracking  Algorithms 

A.K.  STEELE 

Maritime  Systems  DN..  Weapons  Systems  Research  laborcrtory.  Salisbury.  S.A. 

R.L.  STREIT 

Navol  Underwater  Systems  Center.  New  London,  Connecticut.  U.S.A. 

R.F.  BARRETT 

Maritime  Systems  DN..  Weapons  Systems  Research  Laboratory.  Salisbury.  S.A. 


1  INTRODUCTION 

The  problem  of  producing  accurate  target  tracks 
from  noisy  measurements  Is  of  great  current 
Interest  for  the  automation  of  signal  processing 
systems.  Trackers  are  classified  as  either  linear 
or  nonlinear  systems.  Examples  of  linear  trackers 
are  the  alpha-beta  and  Kalman  trackers.  Examples 
of  nonlinear  trackers  are  the  probabilistic  data 
association  (PDA)  and  hidden  Harkov  model  (HMM) 
trackers.  This  paper  deals  exclusively  with  these 
two  nonlinear  trackers. 

Linear  trackers  estimate  the  track  using  simple 
deterministic  or  statistical  dynamic  target  motion 
models  to  develop  filters  for  the  target  position 
estimates.  The  problem  with  linear  trackers  Is  that 
they  are  sensitive  to  outliers  and  false 
measurements.  The  nonlinear  PDA  methodology  can  be 
applied  to  (single  Input)  linear  trackers  to 
overcome  the  problems  caused  by  outliers  and 
multiple  Input  measurements.  Application  of  the  PDA 
methodology  to  the  Kalman  tracker  results  In  the 
PDA-Kalman  tracker  studied  In  this  paper. 

The  HMM  tracker,  which  la  a  recent  development, 
models  the  target  measurement  sequence  as  a 
probabilistic  function  of  a  Markov  chain.  The 
states  of  the  Markov  chain  define  the  target 
states,  and  the  transition  probability  matrix  of 
the  Harkov  chain  defines  possible  target  motion. 
The  probabilistic  function  describes  exactly  the 
non-Gausslan  nature  of  the  measurement  process.  The 
HMM  tracker  provides  a  unified  mathematical 
framework  for  describing  Important  tracking  problem 
Issues.  In  particular,  the  HMM  tracker  Initiates 
and  terminates  tracks  automatically  as  an  Intrinsic 
feature  of  the  tracking  algorithm.  It  does  this  by 
Incorporating  a  special  state  Into  the  Markov  chain 
to  designate  the  absence  of  a  target.  The  HMM 
tracking  algorithm  Is  equivalent  to  a  sequence  of 
matrix-vector  products. 

A  new  development  Is  the  HMM/A  tracker  which  Is  a 
HMM  tracker  that  also  uses  the  amplitude  of  the 
Input  measurements  as  additional  Information.  The 
Inclusion  of  amplitude  Information  does  not 
significantly  Increase  the  complexity  of  the  HMM/A 
tracker  over  that  of  the  HMM  tracker.  The  HMM/A 
tracker  Is  the  only  tracker  In  this  paper  that  uses 
amplitude  Information.  (For  a  discussion  of  the 
performance  of  the  HMM/A  tracker  as  a  signal 
detector  see  Barrett  A  Strelt,  1989.) 

This  paper  compares  the  PDA,  HMM,  and  HMM/A 
tracking  algorithms  when  used  for  frequency  line 
tracking  on  two  different  simulated  data  sets. 
These  examples  (and  others  not  given  here)  show 
that  quantitative  comparisons  of  the  frequency  line 
tracking  algorithms  ( FLTA )  require  careful 


statistical  analysis.  At  high  SNRs  adequate 
comparisons  can  be  made  using  simple  error  measures 
(e.g.  rms  tracking  error)  on  a  few  data  sets,  but 
at  low  SNRs,  such  measures  can  be  misleading  and 
one  must  resort  to  ensemble  statistical  measures  of 
tracker  performance.  Such  statistical  performance 
comparisons  are  especially  Important  when 
discussing  track  Initiation  and  track  termination. 

2  DEFINITIONS 

In  this  paper,  we  treat  the  problem  of  tracking  the 
time  variation  of  the  instantaneous  frequency  of  an 
Isolated  tone  embedded  in  additive  white  noise  as  a 
post-detection  process  applied  to  the  evolving 
short  term  Fourier  spectra  of  the  sampled  time 
series.  Separating  the  tracking  problem  from  the 
detection  problem  can  lead  to  suboptlmal  tracking 
performance,  but  It  is  an  approach  commonly  used  In 
practice. 

The  choice  of  detector  Is  Important  because  the 
output  of  the  detector  determines  the  tracker  Input 
measurements.  Throughout  this  paper  we  use  a  simple 
threshold  detector  because  of  Its  widespread  usage 
and  ease  of  Implementation.  No  Interpolation  la 
employed  to  smooth  the  Intrinsic  quantization 
effects  of  this  detector  because  Interpolation  Is 
not  justified  at  low  SNR.  Other  well  known 
problems  associated  with  the  threshold  detector  are 
outliers  (false  detections)  and  missed  detections. 
These  problems  cause  serious  tracking  errors  In 
linear  trackers i  however,  as  the  examples  will 
show,  our  FDA,  HMM  and  HMM/A  trackers  are  robust 
against  outliers  and  missed  detections  and  are 
capable  of  tracking  down  to  the  Input  quantization 
level. 

All  the  trackers  considered  here  can.  In  principle, 
accept  multiple  measurements  ss  Input)  l.e.,  the 
tracker  Input  is  the  set  (possibly  empty)  of  centre 
frequencies  of  all  FFT  cells  whose  amplitude 
exceeds  the  detection  threshold)  however.  If  the 
detection  threshold  Is  not  exceeded,  no  frequency 
measurement  Is  made  for  the  current  scan,  or  block 
of  time  series  data.  On  the  other  hand,  the  tracker 
Input  can  be  a  single  measurement,  l.e.  the  FFT 
cell  having  the  largest  amplitude  If  the  amplitude 
exceeds  the  threshold. 

The  FDA  trackers  considered  here  use  either  single 
or  multiple  measurements  end  ignore  the  amplitude 
of  the  measurements.  The  only  HMM  tracker 
considered  here  accepts  Just  single  measurements, 
while  the  only  HMM/A  tracker  uses  single 
measurements  together  with  their  associated 
amplitudes. 

3  PROBABILISTIC  DATA  ASSOCIATION  TRACKING 
We  present  PDA  as  a  method  for  converting  a  slngle- 


-187- 


input-single-output  (SISO)  tracker  Into  s  multlple- 
lnput-singls-output  trscksr.  PDA  assumes  thst  only 
on#  of  ths  multiple  Input  measurements  corresponds 
to  ths  tsrgst  being  tracked.  It  further  assumes 
that  measurements  in  the  next  scan  will  be  normally 
distributed  with  the  mean  and  covariance  predicted 
from  the  current  scan.  PDA  is  thus  applicable  to 
any  SISO  tracker  in  vhlch  measurement  mean  and 
covariance  at  the  next  scan  are  predicted. 

Track  Input  measurements  are  gated,  thus  creating 
the  possibility  of  false  dismissal  of  the  target 
measurement .  Using  the  PDA  assumptions,  the 
probability  of  false  dismissal  is  easily  evaluated. 
The  usual  gated  PDA  method  uses  a  variable  gate  to 
achieve  constant  probability  of  false  dismissal. 
For  the  FLTA,  ve  use  a  fixed  gate  so  the  false 
dismissal  probability  varies  from  scan  to  scan. 

Let  ai  bo  the  probability  that  the  1-th  measurement 
corresponds  to  the  target,  and  let  a0  be  the 
probability  that  none  of  the  measurements 
corresponds  to  the  target.  Similarly,  let  f j  be 
the  track  frequency  estimate  generated  by  using  the 
1-th  measurement  In  the  underlying  SISO  tracker, 
and  let  f0  be  the  predicted  track  frequency  when  no 
measurement  is  made.  Then  the  PDA  tracker  output 
Is  given  by 

N 

*PDA  “  *0*0  *  alfl  '  (11 

where  N  Is  the  total  number  of  detections  In  the 
gate.  For  the  PDA  Kalman  tracker,  the  error 
covariance  associated  with  fPDA  is  also  computed 
(Bar-Shalom  a  Fortmann,  1SS8),  and  it  is  used  with 
f PDA  to  make  the  measurement  mean  and  covariance 
predictions  in  the  underlying  SISO  Kalman  filter. 
The  probabilities  (aj )  are  easily  computed  and 
require  only  the  evaluation  of  a  truncated  Gaussian 
density  function.  The  nonlinear  dependence  of  all 
PDA  trackers  on  the  pleasured  track  input  data  Is 
due  to  the  nonlinear  dependence  of  the 
probabilities  (a^l  on  the  data. 

4  HIDDEN  HARKOV  MODEL  TRACKING 

HUM'S  are  probabilistic  models  that  are  commonly 
used  in  speech  applications.  Their  utility  in 
tracking  applications  seems  not  to  be  recognised  in 
the  general  literature,  except  for  a  paper  by  Kopec 
(1988)  who  uses  them  to  track  formants,  or 
resonances,  in  spoken  words.  The  HUM  tracker 
presented  in  this  paper  is  similar  to  Kopec's 
formant  trackeri  however,  the  FLTA  application 
permits  the  analytical  development  of  the 
parameters  defining  the  underlying  HMM. 

The  HMM  tracker  is  a  fixed  Interval  FLTAi  l.e..  It 
takes  a  fixed  length  sequence,  or  "window",  of 
measurements  and  outputs  a  track  estimate  for  each 
time  In  the  window.  By  sliding  the  window  along  as 
new  data  are  collected,  the  track  estimate  evolves 
In  time.  Alternatively,  one  may  simply  Increase 
the  window  size.  Either  way,  the  HMM  tracker  Is 
used  only  to  compute  the  output  track  estimate  for 
each  given  tracking  window.  The  quantised 
frequency  track  Is  modelled  as  a  finite  state 
Markov  chain.  A  "faded"  or  "zero”  state  represents 
a  track  whose  SNR  Is  less  than  the  tracker  SNR  (see 
belowli  the  remaining  "active"  states  represent  a 
track  occupying  an  FFT  cell  Inside  a  fixed  gate  and 
having  an  SNR  greater  than  the  tracker  SNR.  Track 
initiation  Is  defined  as  a  transition  from  the 
zero  state  to  any  active  state,  while  track 
termination  is  defined  as  a  transition  from  any 
active  state  Into  the  zero  state.  Initiation  and 
termination  of  tracks,  as  well  as  movement  of  the 


track  within  the  gate,  are  therefore  governed  by 
the  transition  probability  matrix.  A,  of  the  Markov 
chain. 

Measurements  of  the  frequency  track  are 
characterised  by  a  detection  probability  matrix,  B. 
Thus,  for  each  possible  target  state,  the  threshold 
detector  outputs  are  measured  target  states  with 
probabilities  that  are  computed  analytically  from 
the  SNR  and  the  threshold.  The  SNR  assumed  for 
this  B-matrlx  calculation  Is  called  the  tracker 
SNR;  in  effect.  It  Is  the  lowest  SNR  at  which 
tracks  are  Initiated  and  estimated. 

The  HMM  tracker  Is  fully  specified  by  the  A-  and  B- 
matrlces.  together  with  the  Initial  state 
probability  vector  fr.  Initially , TT  corresponds  to 
a  target  In  the  zero  state.  This  forces  automatic 
track  initiation.  If  the  tracking  window  slides 
along  as  new  data  are  collected, TT  Is  updated  using 
current  HMM  tracker  output. 

The  HMM/A  tracker  cannot  utilize  a  B-matrlx  because 
amplitude  is  a  continuous  quantity.  Instead,  It  is 
necessary  to  compute  the  likelihoods  of  the 
measurements,  conditioned  on  each  possible  target 
state.  There  are  a  finite  number  of  states,  so 
these  conditional  likelihoods  can  be  stored  as  a 
matrix . 

The  HMM  and  HMM/A  trackers  output  both  a  discrete 
(quantised)  track  and  a  continuous  track.  The 
quantised  track  Is  the  Vlterbi  track)  l.e.,  of  all 
possible  tracks  (realisations  of  the  Markov  chain), 
the  Vlterbi  track  Is  the  one  most  likely  to  account 
for  the  measurement  sequence.  The  continuous  track 
Is  essentially  the  expected  track,  with  the 
expectation  taken  over  all  possible  realisations  of 
the  Markov  chain.  Strictly  speaking,  the  expected 
track  Is  conditioned  on  the  track  not  occupying  the 
zero  state,  as  well  as  on  the  measurements.  The 
total  probability  of  the  zero  state  track 
(conditioned  only  on  the  measurements)  at  each 
point  In  the  window  is  thus  a  necessary  complement 
to  the  continuous  track  estimate.  The  gate 
occupancy  probability  (GOP)  Is  defined  as  one  minus 
the  probability  of  the  zero  state  track,  and  It  is 
the  GOP  that  Is  plotted  In  the  examples  In  the  next 
section. 

The  discrete  output  of  the  HMM  tracker  Is  computed 
using  only  n2T  additions,  while  the  continuous 
output  uses  n2T  multiplications,  where  n  is  the 
number  of  Harkov  chain  states  and  T  Is  the  number 
of  scans  In  the  window.  The  discrete  and  continuous 
algorithms  are  easily  vectorised.  Similar  remarks 
hold  for  the  HMM/A  tracker  once  the  necessary 
conditional  likelihoods  have  been  computed. 

S.  EXAMPLES 

The  simulated  data  were  obtained  by  generating  a 
sine  wave  triangularly  swept  In  frequency  with  a 
period  of  132  scans,  a  centre  frequency  of  5,  and  a 
maximum  deviation  of  3.  Uncorrelated  noise  Is  then 
added  to  the  sine  wave,  Fourier  transformed,  and 
then  passed  to  a  threshold  detector.  One  hundred 
scans  of  the  output  of  the  threshold  detector  are 
then  used  as  the  input  for  the  various  trackers. 
The  signal  is  absent  until  scan  15  when  the  SNR  Is 
Increased  instantaneously  to  3  dB  (In  an  FFT  cell). 
The  SNR  is  kept  at  that  level  until  scan  79,  when 
the  signal  ceases.  All  the  trackers  use  a  gate 
width  of  9  FFT  cells. 

Figure  1  illustrates  the  cell  occupancy  of  the  true 
track.  This  track  was  obtained  using  an  HMM 
tracker,  with  data  similar  to  that  above,  except 
that  the  SNR  was  increased  until  no  false 


-188- 


detections  were  obtained  when  the  signal  was 
present.  The  track  Initiates  out  o f  and  terminates 
Into  the  zero  state  which  Is  Indicated  by  the  dots 
appearing  In  the  window  below  the  9  active  states 
In  the  gate.  Note  In  this  figure  the  last  three 
scans  for  which  the  signal  Is  present.  Any  tracker 
will  have  difficulty  tracking  these  three 
measurements . 


SCAN  NUMBER 

FIGURE  1.  Track  output  at  high  SNR. 


an  optimization  procedure  will  require  the 
development  of  ensemble  statistical  tracker 
performance  measures  which  could  then  be  used  to 
compare  the  performance  of  different  trackers.  This 
Is  the  subject  of  future  work. 

The  Input  measurement  data  used  for  the  HMM  and 
HMM/A  trackers  are  different  even  when  they 
originated  from  the  same  data  set.  This  arises 
because  of  the  different  detection  thresholds  used 
for  HMM  and  KMM/A.  A  finite  detection  threshold  Is 
essential  for  the  proper  operation  of  the  HMM 
tracker  since  the  threshold  level  Is  an  Input  to 
the  HMM  algorithm  and  Indicates  the  significance  of 
an  Individual  detection.  For  these  examples  the 
best  threshold  for  the  HMM  tracker  was  found  to  be 
1.8  times  the  average  noise  level.  For  the  HMM/A 


o 10  20  Vo  ho  so  60  20  sb  90  i  do 

SCAN  NUMBER 


FIGURE  2.  Input  data,  discrete  track  output  8  GOP 
for  HMM  tracker:  data  set  1. 


SCAN  NUMBER 


FIGURE  3.  Input  data,  discrete  track  output  9  GOP 
for  HMM/A  tracker t  data  set  l. 


Figures  2-5  show  the  Input  data,  the  discrete 
(Vlterbl  track)  output,  and  the  gate  occupancy 
probability  of  the  HMM  and  HMM/A  trackers  using  the 
two  different  data  sets.  For  the  first  data  set 
both  the  trackers  Initiate  early  due  to  spurious 
data,  but  the  HMM  Initiates  In  the  correct  state 
while  the  HMM/A  does  not.  Both  tracks  terminate 
three  cells  early.  For  the  second  data  set,  the  HMM 
falls  to  track  the  varletion,  but  this  Is  hardly 
surprising  when  one  looks  at  the  Input  data.  The 
HMM/A  on  the  other  hand  has  additional  Information, 
viz.  the  amplitude  of  the  measurements,  and  this 
Information  Is  critical  to  good  tracking  In  this 
data  set.  The  HMM/A  Initiates  correctly,  but 
terminates  six  scans  early. 

Figures  2-5  show  very  clearly  the  effects  of 
statistical  variations  between  the  two  data  sets 
which  can  result  In  significantly  different 
performances.  This  shows  that  both  the  HMM  and 
HMM/A  trackers  need  to  be  optimized  for  good 
performance  In  the  ensemble  statistical  sense.  Such 


SCAN  NUMBER 


FIGURE  4.  Input  data,  discrete  track  output  t  GOP 
for  HMM  tracker;  data  set  2. 


SCAN  NUMBER 


FIGURE  5.  Input  data,  discrete  track  output  9  GOP 
for  HMM/A  tracker*  data  set  2. 


tracker,  the  significance  of  each  measurement  Is 
contained  in  the  amplitude  Information.  Raising  the 
detection  threshold  In  this  case  therefore  tends  to 
reduce  the  Information  available  to  the  tracker, 
and  the  threshold  Is  best  kept  small  (Barrett  A 
Strelt,  1989). 

Figures  9-9  show  the  input  data,  the  track  output, 
and  the  logarithm  (base  10)  of  the  estimated 
covariance  of  the  frequency  estimate  for  PDA 
trackers  that  use  single  and  multiple  gate 
measurements  for  the  same  two  data  sets.  Note  that 
even  for  a  single  data  set  there  are  slight 
differences  in  the  single  gate  measurement  Input 
data  used  for  the  PDA  and  HMM.  This  Is  due  to  the 
use  of  different  detection  thresholds.  It  was  not 
possible  to  find  a  common  detection  threshold  for 
use  with  both  PDA  and  HMM  because  PDA  required  s 
higher  threshold  than  HMM  for  successful  tracking. 

The  PDA  tracker  Implemented  here  Is  a  causal 
tracker,  l.e.  has  no  lag,  and  consequently  one 


-189- 


SCAN  NUMBER 

FIGURE  6.  Input  data,  track  output  S  log  covarlanca 
for  FDA  trackar  with  single  gata 
measurements)  data  set  1. 


SCAN  NUMBER 

FIGURE  S.  Input  data,  track  output  A  log  covariance 
for  PDA  tracker  with  single  gate 
measurements)  data  set  2. 


SCAN  NUMBER 

Input  data,  track  output  A  log  covariance 
for  PDA  tracker  with  multiple  gate 
measurements)  data  set  1. 


«0 

SCAN  NUMBER 

FIGURE  9.  Input  data,  track  output  A  log  covariance 
for  PDA  tracker  with  multiple  gate 
measurements;  data  set  2. 


expects  some  delay  In  following  dynamic  track 
variations.  The  reluctance  of  the  PDA  tracker  to 
change  state  resulted  In  tracks  that  lingered  In 
the  wrong  state  for  up  to  five  scans  before 
switching  to  the  correct  etate.  The  eeverlty  of 
this  problem  might  be  reduced  by  putting  some  lag 
Into  the  PDA  tracker  (Hahalanabls,  Prasad  A  Garg, 
1986). 

The  most  significant  problems  with  our  PDA  tracker 
are  the  Issues  of  track  Initiation  and  track 
termination.  In  these  examples  the  tracker  was 
initiated  In  cell  five,  but  the  signal  did  not 
exist  so  the  tracker  followed  noise  until  scan  IS. 
Similarly,  the  tracker  followed  noise  after  the 
signal  terminated  after  acan  79.  When  the  PDA 
tracker  is  following  noise  one  would  expect  the 
covariance  of  the  frequency  estimate  to  be  large. 
However,  as  can  be  seen  In  these  figures  there  Is 
no  way  to  set  a  threshold  on  the  estimated 
covariance  that  would  satisfactorily  Indicate  the 
presence  or  absence  of  a  signal. 

The  multiple  gate  measurement  PDA  tracker  shows 
little  evidence  of  greater  stability  over  the 
single  gate  measurement  PDA  for  these  two  data 
sets.  However,  other  data  sets  have  shown  that 
multiple  measurements  appear  to  give  some 
robustness  to  the  PDA  tracker,  making  It  less 
responsive  to  outliers,  but  at  the  cost  of 
rendering  It  somewhat  less  responsive  to  state 
changes . 


CONCLUSIONS 

The  development  of  ensemble  statistical  tracker 
performance  measures  must  be  accomplished  in  order 
toi 

(U  optimise  the  tracking  performance  of  any 
tracker,  especially  the  HUM  and  HMM/A 
trackers) 

(11)  provide  a  basis  for  comparison  of  all 
frequency  line  tracking  algorithms;  and 
(lit)  provide  a  means  of  comparing  the  automatic 
track  Initiation  and  termination 
characteristics  of  different  algorithms. 
These  Issues  are  especially  important  at  low  SNRs 
because  then  only  ensemble  statistical  tracking 
behaviour  Is  meaningful. 

Our  PDA  algorithm  as  currently  configured  cannot 
satisfactorily  initiate  or  terminate  a  track  using 
PDA  covariance  estimates.  The  KMM  and  HHH/A 
trackers  appear  to  offer  significantly  Improved 
capabilities  In  this  area,  but  ensemble  performance 
measures  are  necessary  to  quantify  track  Initiation 
and  termination  In  a  meaningful  way. 

From  the  discussion  of  the  examples  It  is  clear 
that  the  Incorporation  of  amplitude  Information 
Into  our  PDA  tracker  could  offer  seme  Improvement 
In  performance.  On  the  other  hand  the  Incorporation 
of  multiple  gate  measurements  Into  both  the  HHH  and 
HHH/A  trackers  could  also  enhance  their 
performance.  These  extensions  to  our  trackers  would 
make  them  more  directly  comparable. 


-190- 


4 


Finally,  the  Incorporation  of  phasa  Information 
Into  FLTAs  la  a  most  promising  way  of  Improving 
thalr  performance.  Similarly,  tha  Incorporation  of 
derivative  information  Into  HMM  based  FLTAs  should 
improve  their  performance  because  of  better  signal 
modelling. 

REFERENCES 

Barrett.  R.F.,  Steele.  A.K.,  and  Strait,  R.L. 
( 19S8) .  Frequency  line  tracking  algorithm.  Proc. 
of  the  NATO  Advanced  Study  Institute  on  Underwater 
Acoustic  Data  Processing.  Kingston,  Ontario, 
Canada.  July  16-29. 

Barrett.  R.F.  and  Strait,  R.L.  (1969).  Automatic 
Detection  of  Frequency  Nodulated  Spectral  Lines. 
Proc.  Australian  Symposium  on  Signal  Processing  and 
Its  Applications.  Adelaide,  Australia  April  17-19. 


Bar-Shalom,  Y.  and  Fortmann.  T.E.I1986).  Tracking 
and  Data  Association.  Orlando,  Florida,  Academic 
Press. 

Kopec.  C.E.  (1966).  Format  tracking  using  hidden 
Harkov  models  and  vector  quantisation.  IEEE  Trans. 
Acoustics.  Speech  and  Signal  Processing.  Vol .  ASSP- 
34,  August  1986,  pp. 709-726. 

Mahalanabls,  A.K.,  Prasad.  S..  and  Garg.  A.  (1986). 
A  Smoothing  Algorithm  for  Improved  Tracking  In 
Clutter  and  Multitarget  Environment.  Proc.  1986 
American  Control  Conference.  Seattle,  Washington, 
pp. 908-910. 

Strait,  R.L.  and  Barrett.  R.F.  Frequency  Line 
Tracking  Using  Hidden  Markov  Models.  (Submitted  for 
publication. ) 


-191- 


Frequency  Line  Tracking  Algorithms 


R.  F.  Barrett,  A.  K.  Steele  and  R.  L.  Streit 


-19*- 


FREQUENCY  LINE  TRACKING  ALGORITHMS 


R.F.  BARRETT  &  A. K.  STEELE 

Weapons  Systems  Research  Laboratory,  Defence  Science  and  Technology 
Organisation,  PO  Box  1700,  Salisbury,  South  Australia  5108,  Australia. 

R.L.  STREIT 

Naval  Underwater  Systems  Center,  New  London,  Connecticut  06320-5594,  USA. 
(On  exchange  to  Weapons  Systems  Research  Laboratory). 

1.  INTRODUCTION 

This  paper  has  several  purposes.  The  first  purpose  is  to  compare  via 
simulation  the  performance  of  six  different  frequency  line  tracking 
algorithms  (FLTA's)  when  used  in  conjunction  with  a  simple  threshold 
detector.  The  second  purpose  is  show  that  the  probabilistic  data 
association  (PDA)  method  for  handling  multiple  detections  is  not  limited 
to  the  Kalman  filter  context  in  which  it  has  hitherto  been  presented.  In 
particular,  we  present  a  PDA  alpha-beta  tracker  that  handles  multiple 
detections  without  sacrificing  algorithmic  simplicity.  The  third  purpose 
is  to  discuss  a  new  tracker  based  on  hidden  Markov  models  (HMM's).  An 
important  and  intrinsic  feature  of  the  HMM  tracker  is  that  it  initiates 
and  terminates  tracks  automatically. 

In  this  paper,  we  treat  the  problem  of  tracking  the  time  variation  of 
the  (instantaneous)  frequency  of  an  isolated  tone  embedded  in  additive 
white  noise  as  a  post-detection  process  applied  to  the  evolving  short  term 
Fourier  spectra  of  the  sampled  time  series.  Separating  the  tracking 
problem  from  the  detection  problem  can  lead  to  suboptimal  tracking 
performance,  but  it  is  an  approach  cotnnonly  used  in  practice. 

The  choice  of  detector  is  also  important,  but  throughout  this  paper  we 
use  a  simple  threshold  detector  because  of  its  widespread  usage  and  ease 
of  implementation.  No  interpolation  is  employed  to  smooth  the  intrinsic 
quantization  effects  of  this  detector  because  interpolation  is  not 
justified  at  low  SNR.  Other  well  known  problems  associated  with  the 
threshold  detector  are  outliers  (false  detections)  and  missed  detections. 
These  problems  cause  serious  tracking  errors  in  the  conventional  trackers 
studied  in  this  paper;  however,  as  the  exanqples  will  show,  PDA  trackers 
and  the  HMM  tracker  are  robust  against  outliers  and  missed  detections  and 
are  capable  of  tracking  down  to  the  input  quantization  level. 

The  three  conventional  trackers  studied  in  this  paper  are  the  alpha-beta 
(1],  Kalman  [2],  and  fixed  lag  Kalman  smoothing  trackers  [3].  Each 
requires  as  input  a  single  detection.  The  tracker  input  is  the  centre 
frequency  of  the  FFT  cell  with  the  largest  amplitude;  however,  if  the 
detection  threshold  is  not  exceeded,  no  frequency  measurement  is  made  for 
the  current  scan,  or  block  of  time  series  data. 

The  two  PDA  trackers  studied  in  this  paper  are  the  PDA  alpha-beta  and 
the  PDA  Kalman  (4]  trackers.  (A  PDA  fixed  lag  Kalman  smoothing  tracker  is 
also  possible  {5],  but  we  have  not  yet  implemented  it).  All  PDA  trackers 
accept  multiple  detections  as  input;  i.e.,  the  tracker  input  is  the  set 
(possibly  empty)  of  centre  frequencies  of  all  FFT  cells  whose  amplitude 
exceeds  the  detection  threshold. 


-195- 


The  last  tracker  studied  in  this  paper  is  the  HMM  tracker  (6].  It  can 
accept  as  input  either  multiple  detections  or  the  strongest  detection; 
however,  the  version  studied  here  uses  the  same  input  as  the  conventional 
trackers  mentioned  above.  It  is  unique  among  the  trackers  studied  in  this 
paper  in  its  ability  to  initiate  and  terminate  tracks  automatically. 

2.  BRIEF  DESCRIPTION  OF  THE  TRACKERS 

2 . 1  The  Conventional  Trackers 

The  conventional  tracking  algorithms  accept  a  single  input  frequency 
measurement  and  generate  two  output  ^quantities:  the  track  frequency 
estimate  f  and  the  time  derivative  f  for  the  current  scan.  The 
mathematical  form  of  the  track  dynamic  models  assumes  that  f  is  constant 
over  the  scan  update  interval,  a  reasonable  assumption  if  the  change  in  f 
over  the  scan  update  interval  is  small.  The  two  Kalman  trackers  modify 
this  simple  dynamic  model  for  f  by  corrupting  it  with  additive  Gaussian 
process  noise;  however,  the  alpha-beta  tracker  does  not  explicitly  include 
a  process  noise  term.  Process  noise  accounts  for  mismatch  between  the 
assumed  track  dynamic  model  and  true  track  dynamics.  The  fixed  lag  Kalman 
smoothing  tracker  [3]  differs  from  the  other  two  conventional  trackers  in 
its  utilisation  of  track  measurements  from  scans  in  advance  of  the  (fixed 
lag)  estimation  point  to  improve  the  output  track  estimate. 

The  problem  with  conventional  trackers  is  that  they  are  linear  systems. 
Consequently,  their  response  to  outliers  is  governed  by  their  impulse 
response  function,  while  their  response  to  missed  detections  is  by 
comparison  much  less  important.  Outliers  and  missed  detections  are  common 
at  low  SNR,  so  optimising  a  conventional  tracker  is  essentially  equivalent 
to  optimising  its  impulse  response  function.  One  way  to  avoid  the  impulse 
response  function  issue  is  to  avoid  trackers  that  are  linear  functions  of 
the  measurement  sequence.  The  PDA  trackers  and  the  HMM  tracker  discussed 
below  are  nonlinear  systems,  and  their  response  to  outliers  is  much 
superior  to  that  of  the  conventional  trackers. 

2.2  The  PDA  Trackers 

We  present  PDA  as  a  method  for  converting  a  single-input-single-output 
(SISO)  tracker  into  a  multiple-input-single-output  tracker.  PDA  assumes 
that  only  one  of  the  multiple  input  measurements  corresponds  to  the  target 
being  tracked.  It  further  assumes  that  measurements  in  the  next  scan  will 
be  normally  distributed  with  the  mean  and  covariance  predicted  from  the 
current  scan.  PDA  is  thus  applicable  to  any  SISO  tracker  in  which 
measurement  mean  and  covariance  at  the  next  scan  are  predicted. 

The  conventional  Kalman  tracker  predicts  the  target  state  and  error 
covariance,  and  the  predicted  measurement  mean  and  covariance  follow  from 
the  general  Kalman  system  equations.  On  the  other  hand,  the  alpha-beta 
tracker  predicts  the  target  state,  but  not  the  error  covariance.  For  the 
FLTA,  we  interpret  the  target  state  as  the  predicted  measurement  mean  and 
supplement  the  constants  a  and  B  with  another  constant  a2  denoting  the 
covariance  of  the  measurements  in  the  next  scan.  The  PDA  alpha-beta 
tracker  is  completely  specified  by  (a,  B,  a). 

Track  input  measurements  are  gated,  thus  creating  the  possibility  of 
false  dismissal  of  the  target  measurement.  Using  the  PDA  assumptions,  the 
probability  of  false  dismissal  is  easily  evaluated.  The  usual  gated  PDA 
method  uses  a  variable  gate  to  achieve  constant  probability  of  false 
dismissal.  For  the  FLTA,  we  use  a  fixed  gate  so  the  false  dismissal 
probability  varies  from  scan  to  scan. 

Let  B.  be  the  probability  that  the  i-th  measurement  corresponds  to  the 
target,  and  let  Bq  be  the  probability  that  none  of  the  measurements 


-196- 


corresponds  to  the  target.  Similarly,  let  f'.  be  the  track  frequency 
estimate  generated^  by  using  the  i-th  measurement  in  the  underlying  SISO 
tracker,  and  let  f  be  the  predicted  track  frequency  when  no  measurement 

is  made.  Then  the  PDA  tracker  output  is  given  by  f _ _  =  13  f  +  E  13 .  f .  . 

PDA  o  o  .11 

l 

For  the  PDA  alpha-beta  tracker,  the  necessary  predicted  measurement  mean 
and  covariance  are  the  obvious  ones  obtained  from  f  and  (a,  6,  o).  For 

the  PDA  Kalman  tracker,  the  error  covariance  associated  with  r _ _  is  also 

computed  (see  [4]),  and  it  is  used  with  rpDA  to  make  the  measurement  mean 
and  covariance  predictions  in  the  underlying  SISO  Kalman  filter. 

The  probabilities  {B.}  are  easily  computed  and  require  only  the 
evaluation  of  a  truncated  Gaussian  density  function.  The  nonlinear 
dependence  of  all  PDA  trackers  on  the  measured  track  input  data  is  due  to 
the  nonlinear  dependence  of  the  probabilities  {B^}  on  the  data. 

2.3  The  HMM  Tracker 

HMM's  are  probabilistic  models  that  are  commonly  used  in  speech 
applications.  Their  utility  in  tracking  applications  seems  not  to  be 
recognised  in  the  general  literature,  except  for  a  paper  by  Kopec  [7]  who 
uses  them  to  track  formants,  or  resonances,  in  spoken  words.  The  HMM 
tracker  presented  in  this  paper  is  similar  to  Kopec's  formant  tracker; 
however,  the  FLTA  application  permits  the  analytical  development  of  the 
parameters  defining  the  underlying  HMM. 

The  HMM  tracker  is  a  fixed  interval  FLTA;  i.e.,  it  takes  a  fixed  length 
sequence,  or  "window",  of  measurements  and  outputs  a  track  estimate  for 
each  time  in  the  window.  By  sliding  the  window  along  as  new  data  are 
collected,  the  track  estimate  evolves  in  time.  Alternatively,  one  may 
simply  increase  the  window  size.  Either  way,  the  HMM  tracker  is  used  only 
to  compute  the  output  track  estimate  for  each  given  tracking  window. 

The  quantised  frequency  track  is  modelled  as  a  finite  state  Markov 
chain.  A  "faded"  state  represents  a  track  whose  SNR  is  less  than  the 
tracker  SNR  (see  below);  the  remaining  "active"  states  represent  a  track 
occupying  an  FFT  cell  inside  a  fixed  gate  and  having  an  SNR  greater  than 
the  tracker  SNR.  Track  initiation  is  defined  as  a  transition  from  the 
faded  state  to  any  active  state,  while  track  termination  is  defined  as  a 
transition  from  any  active  state  into  the  "faded"  state.  Initiation  and 
termination  of  tracks,  as  well  as  movement  of  the  track  within  the  gate, 
are  therefore  governed  by  the  transition  probability  matrix.  A,  of  the 
Markov  chain. 

Measurements  of  the  frequency  track  are  characterised  by  a  detection 
probability  matrix,  B.  Thus,  for  each  possible  target  state,  the 
threshold  detector  outputs  are  measured  target  states  with  probabilities 
that  are  computed  analytically  from  the  SNR  and  the  threshold.  The  SNR 
assumed  for  this  B-matrix  calculation  is  called  the  tracker  SNR;  in 
effect,  it  is  the  lowest  SNR  at  which  tracks  are  initiated  and  estimated. 

The  HMM  tracker  is  fully  specified  by  the  A-  and  B-  matrices,  together 
with  the  initial  state  probability  matrix,  n.  Initially,  n  corresponds  to 
a  target  in  the  faded  state.  This  forces  automatic  track  initiation.  If 
the  tracking  window  slides  along  as  new  data  are  collected,  n  is  updated 
using  current  HMM  tracker  output. 

The  HMM  tracker  outputs  both  a  discrete  (quantised)  track  and  a 
continuous  track.  The  quantised  track  is  the  Viterbi  track;  i.e.,  of  all 
possible  tracks  (realisations  of  the  Markov  chain),  the  Viterbi  track  is 
the  one  most  likely  to  account  for  the  measurement  sequence.  The 
continuous  track  is  essentially  the  expected  track,  with  the  expectation 
taken  over  all  possible  realisations  of  the  Markov  chain.  Strictly 


-197- 


speaking,  the  expected  track  is  conditioned  on  the  track  not  having  faded, 
as  well  as  on  the  measurements.  The  total  probability  of  a  faded  track 
(conditioned  only  on  the  measurements)  at  each  point  in  the  window  is  thus 
a  necessary  complement  to  the  continuous  track  estimate. 

The  discrete  output  of  the  HMM  tracker  is  computed  using  only  n2T 
additions,  while  the  continuous  output  uses  n2T  multiplications,  where  n 
is  the  number  of  Markov  chain  states  and  T  is  the  number  of  scans  in  the 
window.  The  discrete  and  continuous  algorithms  are  easily  vectorised. 


3 .  EXAMPLES 

All  six  trackers  are  compared  in  Figure  1,  and  the  nonlinear  trackers 
are  compared  in  Figure  2.  The  gated  input  measurement  is  indicated  by  a 
dot,  and  the  tracker  output  is  the  continuous  curve.  The  data  are 
obtained  by  generating  a  sine  wave  triangularly  swept  in  frequency  with  a 
period  of  400  scans,  a  centre  frequency  of  10,  and  a  maximum  deviation  of 
5.  The  added  uncorrelated  noise  level  and  the  detection  threshold  are  set 
to  give  a  reasonable  number  of  false  detections  and  missed  detections. 
The  conventional  and  HMM  trackers  use  the  largest  detection  in  the  gate; 
the  fixed  lag  Kalman  smoother  has  a  lag  of  10  scans;  the  PDA  trackers  use 
all  detections  in  the  gate;  the  HMM  tracker  uses  a  window  of  250  scans. 


FIGURE  1.  Input  data  and  tracker  outputs  for:  a)  alpha-beta,  b)  Kalman, 
c)  fixed  lag  Kalman  smoother,  d)  PDA  alpha-beta,  e)  PDA  Kalman,  f)  HMM. 


-198- 


The  PDA  trackers  are  initiated  at  the  correct  track  value  and  the  HMM 
tracker  is  initiated  in  the  faded  state.  In  Figure  1  the  SNR  is  constant 
over  all  scans.  It  shows  that  the  nonlinear  trackers  are  robust  against 
outliers,  whereas  the  conventional  trackers  are  not.  In  Figure  2  the  SNR 
starts  low,  increases  linearly  to  scan  100,  is  constant  to  scan  150,  and 
decreases  linearly  to  scan  250.  The  track  is  the  same  as  in  Figure  1,  but 
the  false  alarm  rate  is  increased  by  decreasing  the  detection  threshold. 
The  PDA  trackers  are  incorrect  at  low  SNRs.  The  automatic  track 
initiation  and  termination  of  the  HMM  tracker  is  clearly  evident. 


FIGURE  2.  Input  data  and  tracker  outputs  for:  a)  PDA  alpha-beta,  b)  PDA 
Kalman,  c)  continuous  output  HMM,  d)  discrete  output  HMM. 


REFERENCES 

1.  Benedict  TR,  Bordner  GW:  Synthesis,  of  an  Optimal  set  of  Radar 
Track-While-Scan  Smoothing  Equations.  IRE  Trans.  Automatic  Control, 
Vol.  7,  July  1962,  pp. 27-32. 

2.  Friedland  B:  Optimum  Steady-State  Position  and  Velocity  Estimation 
Using  Noisy  Sampled  Position  Data.  IEEE  Trans.  Aerospace  and 
Electronic  Systems,  Vol.  AES-9,  November  1973,  pp. 906-911. 

3.  Moore  JB:  Discrete-Time  Fixed-Lag  Smoothing  Algorithms.  Automatica, 
Vol. 9,  1973,  pp. 163-173. 

4.  Bar-Shalom  Y,  Fortmann  TE:  Tracking  and  Data  Association.  Orlando, 
Florida,  Academic  Press,  1988. 

5.  Mahalanabis  AK,  Prasad  S,  Garg  A:  A  Smoothing  Algorithm  for  Improved 
Tracking  in  Clutter  and  Multitarget  Environment.  Proc.  1986  American 
Control  Conference,  Seattle,  WA,  1986,  pp. 908-910. 

6.  Streit  RL,  Barrett  RF:  Frequency  Line  Tracking  Using  Hidden  Markov 
Models.  In  preparation. 

7.  Kopec  GE:  Format  Tracking  Using  Hidden  Markov  Models  and  Vector 
Quantisation.  IEEE  Trans.  Acoustics,  Speech  and  Signal  Processing, 
Vol.  ASSP-34 ,  August  1986,  pp. 709-7 28. 


-199- 


Frequency  Line  Tracking 
Using  Hidden  Markov  Models 
With  Phase  Information 


R.  F.  Barrett  and  R.  L.  Streit 


-201- 


Frequency  Line  Tracking  Using  Hidden  Markov 
Models  with  Phase  Information 

R.F.  Barrett 

Maritime  Systems  Division,  Weapons  Systems  Research  Laboratory,  DSTO,  P.O.  Box  1700,  Salisbury,  Australia  5108 

R.L.  Streit 

Naval  Underwater  Systems  Center,  New  London,  Connecticut,  U.S.A. 


An  extension  to  the  Hidden  Markov  Model  (HMM)  frequency  line  tracker  of  Streit  and  Barrett  (1990)  is  presented.  In 
this  extension,  the  FFT  amplitudes  and  phases  in  a  restricted  set  of  states  centered  on  the  signal  frequency  are  passed 
to  the  tracker  as  an  input.  The  result  is  an  improved  tracking  performance  and  a  new  ability  of  the  tracker  to  follow 
frequency  fluctuations  within  one  FFT  bin. 


1.  Introduction 

The  estimation  of  the  frequency  of  isolated  tones 
embedded  in  noise,  and  the  tracking  of  changes  in  these 
estimated  frequencies  as  a  function  of  time  are  two  related 
topics  that  have  recently  received  considerable  attention  in  the 
signal  processing  literature.  Techniques  for  the  solution  of 
these  problems  have  found  applications  in  many  diverse  fields 
(e.g.,  radar,  sonar,  seismology,  etc). 

In  a  conventional  frequency  tracking  problem,  the 
incoming  data  are  divided  into  blocks  of  contiguous  time 
series.  Fast  Fourier  Transforms  (FFTs)  are  then  performed  on 
each  of  these  blocks  to  obtain  frequency  spectra  as  a  function 
of  time.  The  resolution  of  the  frequency  spectra  is  restricted  to 
one  FFT  cell  (or  the  reciprocal  of  the  data  acquisition  time  for 
the  FFT  integral).  The  conventional  frequency  tracking 
problem  consists  of  estimating  the  true  signal  frequency  from 
the  FFT  spectral  data. 

In  an  earlier  paper  (Streit  and  Banett  (1990)),  we  have 
studied  the  formulation  of  frequency  line  tracking  in  terms  of  a 
Hidden  Markov  Model  (HMM).  In  this  approach,  the 
frequency  domain  considered  by  the  tracker  is  restricted  to  a 
subset,  or  gate,  of  the  available  FFT  cells.  For  each  time  block, 
the  cell  containing  the  maximum  power  within  the  gate, 
provided  that  power  exceeds  a  detection  threshold,  is  passed  to 
the  HMM  tracker.  Based  on  a  priori  knowledge  of  some 
specified  parameters  describing  the  statistical  nature  of  the 
track,  the  HMM  tracker  reconstructs  the  time  variation  of  the 
signal  frequency  within  the  gate.  A  zero  state  is  included  to 
allow  for  the  case  when  the  signal  terminates  or  the  signal 
frequency  wanders  outside  of  the  prescribed  gate.  Full  details 
of  the  HMM  frequency  line  tracker  are  contained  in  an  earlier 
publication. 

An  extension  of  this  basic  tracker  was  developed  by 
Barrett  and  Streit  (1989)  and  Steele  et  al  (1989).  The  tracker 
was  modified  so  that  it  received  as  input  the  FFT  amplitude  in 
the  cell  with  the  ma/.imum  spectral  power.  This  extra 
information  was  found  to  enhance  its  performance  by  enabling 
tracking  and  detection  at  lower  signal-to-noise  ratios  (SNRs) 
than  without  amplitude  (see  Barrett  and  Streit  (1989)). 


The  input  to  a  frequency  tracker  is  generally  some  form 
of  frequency  estimator,  and  in  this  area  also,  a  number  of  new 
approaches  have  been  developed.  These  methods  offer 
improvements  in  either  accuracy  or  resolution  over 
conventional  spectral  analysis  techniques.  For  the  case  of  a 
single  tone  embedded  in  noise,  the  Phase  Interpolation 
Estimator  (PIE)  and  Generalised  Phase  Interpolation  Estimator 
(GPIE)  provide  near  optimal  frequency  estimates,  even  at  low 
SNR.  For  examples,  see  McMahon  and  Barrett  (1986,  1987). 

In  these  methods,  the  phase  information  from 
successive  FFTs  is  used  to  refine  the  estimate  of  the  signal 
frequency  so  that  frequency  changes  of  less  than  the  width  of 
one  FFT  cell  can  be  measured.  Previous  track  history  is  not 
taken  into  account,  and  the  frequency  estimates  for  different 
time  blocks  are  independent.  At  low  SNR,  the  2rt  ambiguity  in 
FFT  phase  can  occasionally  result  in  ‘outliers’  far  removed 
from  the  correct  frequency. 

The  purpose  of  the  work  described  in  the  present  paper 
is  to  include  FFT  amplitude  and  phase  information  into  the 
HMM  frequency  tracker  described  by  Streit  and  Barrett  (1990). 
The  added  information  affects  the  performance  in  two  different 
ways.  Firstly,  the  intrinsic  tracker  quantisation  error  can  be 
reduced  so  that  frequency  variations  within  one  FFT  cell  can  be 
readily  tracked.  Secondly,  the  added  information  enables  the 
new  tracker  to  out-perform  the  earlier  tracker,  even  if  the 
intrinsic  tracker  quantisation  error  is  kept  to  one  FFT  resolution 
cell  (as  in  the  earlier  tracker).  The  problem  of  outliers  that 
occurred  with  the  PIE  algorithm  is  avoided  because  these 
extreme  frequency  fluctuations  are  suppressed  by  the  Markov 
chain  process  model  of  the  HMM  tracker.  The  extended  HMM 
tracker  thus  combines  the  high  accuracy  frequency  estimation 
capability  of  the  PIE  algorithm  with  the  accurate  tracking  and 
track  initiation  and  termination  capabilities  of  the  HMM 
tracker. 


2.  Frequency  Tracking  Using  HMM* 


In  the  frequency  tracking  problem  described  by  Streit 
and  Bs  ett  (1990),  the  range  of  frequencies,  or  gate,  over 


-203- 


which  the  track  is  allowed  to  wander  is  divided  into  a  finite 
number  m  of  frequency  cells.  A  zero  state  is  included  to 
allow  for  the  possibility  of  the  track  wandering  outside  of  the 
allowed  frequency  range,  or  terminating  altogether. 

The  transitions  that  occur  between  states  as  time 
progresses  are  characterised  by  a  transition  probability  matrix 
A.  The  elements  of  the  A-matrix  are  the  probabilities  of 
transitions  between  the  states  of  the  Markov  chain.  These 
probabilities  depend  on  the  initiation  and  termination 
probabilities  (u  and  v)  of  the  track,  and  on  the  process  noise,  d. 
The  elements  of  the  A-matrix  are  calculated  from  the  basic 
premise  that  the  probability  distribution  for  a  frequency 
change  in  the  tracked  line  is  a  Gaussian  centred  on  zero.  The 
width  of  this  Gaussian  distribution  is  controlled  by  the  process 
noise  d.  In  addition  to  a  change  from  one  frequency  cell  to 
another  within  the  gate,  transitions  may  also  occur  from  within 
the  gate  to  the  zero  state  (track  termination),  or  from  the  zero 
state  to  a  state  within  the  gate  (track  initiation). 

The  specification  of  the  measurement  probability 
matrix  B  is  also  necessary  to  define  the  HMM.  In  the  basic 
HMM  tracker  of  Streit  and  Barrett  (1990),  a  measurement  is 
defined  as  the  specification  of  the  frequency  cell  in  which  the 
maximum  spectral  power  resides,  provided  that  power  exceeds 
a  prescribed  threshold.  An  element  of  the  B-matrix  is  then  the 
likelihood  of  such  a  detection  in  one  of  the  cells  of  the  gate, 
conditioned  on  the  true  signal  residing  in  a  particular  state  of 
the  Markov  chain.  The  calculation  of  the  elements  of  the  B- 
matrix  is  carried  out  under  the  assumption  that  the  time  series 
data  comprise  a  constant  frequency  sinusoidal  signal  embedded 
in  white  Gaussian  noise.  The  likelihoods  depend  on  the  SNR 
of  the  line  being  tracked  and  on  the  measurement  detection 
threshold. 

The  last  remaining  quantity  to  be  specified  to  the 
tracker  is  the  initial  state  probability  vector  it.  This  vector 
specifies  the  occupation  probabilities  of  the  various  Markov 
states  at  time  zero. 

There  are  three  outputs  from  the  HMM  tracker 
described  in  Streit  and  Barrett  (1990).  The  first  is  the  Viterbi 
track,  which  is  the  optimal  state  sequence  (in  a  maximum 
likelihood  sense)  conditioned  on  the  measurement  sequence. 
The  Viterbi  track  is  thus  a  discrete  output.  The  second  output, 
which  is  continuous,  is  the  Mean  Cell  Occupancy  (MCO) 
track.  From  the  A  and  B  matrices,  the  likelihood  of  the 
occupancy  associated  with  each  Markov  state  as  a  function  of 
time  can  be  calculated.  The  mean  cell  occupancy  (for  cells 
within  the  gate),  and  the  associated  standard  deviation  can  then 
be  obtained.  The  third  output  of  the  HMM  tracker  is  the  Gate 
Occupancy  Probability  (GOP).  The  GOP  function  is  the 
probability,  as  a  function  of  time,  of  a  frequency  with  the 
required  parameters  lying  within  the  selected  gate.  The 
complementary  function  to  the  GOP  (i.e.,  1  -  GOP)  is  the 
occupation  probability  of  the  zero  state. 

3.  Inclusion  of  FFT  Phase 

The  performance  of  any  tracker  clearly  depends  on  the 
amount  of  information  that  is  available  to  it.  hi  this  section 


we  describe  a  HMM  tracker  in  which  the  concept  of  a  •mea¬ 
surement’  is  further  extended  beyond  that  of  Streit  and  Barrett 
(1990)  and  Barren  and  Streit  (1989).  For  each  block  of  time 
series  data,  the  amplitude  and  phase  of  the  FFT  for  each  cell 
within  the  gate  are  determined.  The  complete  specification  of 
the  FFT  amplitudes  and  phases  in  all  gate  cells  for  two  con¬ 
tiguous  blocks  of  time  series  data  is  deemed  a  ’measurement’. 
As  before,  the  B-matrix  is  interpreted  as  the  likelihood  of  any 
particular  measurement,  conditioned  on  each  of  the  underly¬ 
ing  Markov  states. 

Following  Streit  and  Barrett  (1990),  it  is  assumed  that  the 
data  time  series  is  of  the  form: 

z  (fo+  ITj)  =  A  sin  [il>  (<o+  *T,)  +  {]  +  n* 

where  to  is  the  initial  time  ,  and  T,  is  the  sampling  period.  The 
amplitude  A,  phase  f  and  angular  frequency  w  are  assumed 
to  remain  constant  over  the  period  NT,,  which  is  the  data 
acquisition  time  for  a  Fourier  transform  of  size  N.  The  noise 
n,  is  taken  to  be  zero  mean  and  Gaussian  in  nature,  with  a 
variance  of  a2,  so  that 


(ntn;)  =  (t,o2  (2) 

where  6  denotes  the  Kronecker  delta. 

The  discrete  Fourier  Transform  y(w)  at  angular  frequency 
w  of  the  time  series  in  Eq.  1  is  defined  in  Streit  and  Barrett 
(1990)  and  can  be  expressed  in  the  form: 


*(w)- Jfc" 

=  Ct’*  +  Dt’e 


(3) 


where  R  and  >;  represent  the  amplitude  and  phase  of  \  (— ■)-  The 
amplitudes  and  phases  of  the  signal  and  noise  components  are 
denoted  by  (C,<p)  and  (D.ff)  respectively. 

The  joint  Probability  Density  Function  (PDF)  of  R  and  i; 
has  the  form 

R-  -  2/17?  cos  (n  -  fa  +  A~ 

P(R,'l)  =  2^e  ^  (4) 

where  a2  =  a2/2N  . 

* 

The  signal  phase  <t>  differs  for  the  different  time  blocks.  If 
we  define  4>\  to  be  the  phase  corresponding  to  the  signal  for 
the  first  of  two  contiguous  time  blocks,  and  fa  to  be  the  phase 
corresponding  to  the  second  time  block,  then 


fa  »  4>i  +  vNT,  (5) 

For  real  signals,  such  as  that  defined  in  (1),  the  approximation 
in  (5)  is  only  valid  at  frequencies  far  removed  from  zero. 
However,  if  the  analytic  signal  is  used  instead  of  the  real 
signal,  eq.  (5)  becomes  valid  at  all  frequencies. 

Eq.  (5)  implies  that  the  angular  frequency  w  is  related  not 
to  the  phase  in  each  cell,  but  to  the  difference  in  the  phases 
in  the  cell  containing  the  signal  for  the  two  contiguous  time 
blocks.  We  are  thus  interested  more  in  the  phase  change  in 
a  cell  as  time  progresses,  than  in  the  absolute  value  of  the 
phase.  Our  concept  of  a  'measurement'  is  thus  extended  to 
include  the  simultaneous  specification  of  the  quantities  R„i, 
R„2  and  ijb j  -  rj„ i  where  Ret,  rjnt  are  the  amplitudes  and 
phases  of  the  Fast  Fourier  Transforms  in  cell  n  at  time  t  =  to, 
and  Rq2,  Via  are  the  corresponding  quantities  at  time  t  =  to  + 
NT,,  where  1  <  n  <  m  With  this  definition  of  a  measurement, 


-204 


overlapping  causes  successive  measurements  to  be  correlated, 
which  is  contrary  to  the  HMM  model.  This  contradiction  is 
ignored  here,  but  the  overlapping  can  easily  be  removed  if  it 
results  in  difficulties. 

The  likelihood  of  the  measurement,  conditioned  on  an 
initial  signal  amplitude  A  and  frequency  w  ,  can  be  shown 
to  have  the  form: 

m 

B  (u>)  —  PJ  P  */i»2  ~  tynl  1^4,  w)  (6) 

n  =  1 

where 

P  (Rnx  'Rni.nm  - 

2  roK 

K  +  ^z  +  2.42]  (7) 

«■  v  '•(£) 

In  eq.(7),  Io  represents  the  modi„.d  Bessel  function  and 

2  =  yJRl\  +  Pn2  +  2rt„i/2„2coS(r)n2-,,nl-ti1V7’1)  (8) 

The  implementation  of  the  HMM  tracker  with  phase  and 
amplitude  included  differs  from  the  earlier  implementation 
described  in  Streit  and  Barren  (1990)  in  a  number  of  important 
points.  Firstly,  because  the  presence  of  phase  information 
enables  frequencies  to  be  estimated  to  a  greater  accuracy  than 
one  FFT  cell,  the  states  of  the  HMM  need  not  coincide  with  the 
FFT  cells.  In  the  present  work,  the  frequency  range  spanned  by 
each  HMM  state  is  arbitrary.  The  state  width  and  process  noise 
d  can  be  adjusted  to  reflect  the  increased  estimation  accuracy, 
particularly  at  high  SNRs,  allowed  by  the  phase  information. 

The  second  difference  lies  in  the  fact  that  the  likelihood 
function  in  eq  (6)  can  be  highly  peaked  as  a  function  of  w  if  the 
SNR  is  fairly  high.  The  calculation  of  likelihoods  at  the  centre 
frequencies  of  the  gate  cells  may  therefore  not  be  an  accurate 
enough  representation  of  the  average  likelihood  over  the  span 
of  the  state.  This  effect  can  be  countered,  either  by  selecting 
the  frequency  span  of  each  state  to  be  significantly  smaller 
than  the  spread  in  the  likelihood  function,  or  alternatively  by 
integrating  B  (u>)  over  the  frequency  span  of  the  state.  The 
particular  selection  will  depend  on  the  accuracy  desired  in  the 
frequency  track. 


4.  Discussion  of  Results 


Results  of  the  application  of  the  HMM  frequency  line 
tracker  to  two  sets  of  simulated  data  are  presented  in  Figs.  1 
and  2.  In  Fig.  1,  the  results  from  three  different  versions  of  the 
tracker  arc  compared.  These  versions  are  those  of  i)  Streit  and 
Barrett  (1990)  in  which  no  FFT  amplitude  or  phase  information 
is  passed  to  the  tracker;  ii)  Barrett  and  Streit  (1989)  in  which 
the  FFT  amplitude  from  the  cell  containing  the  maximum 
power  is  used;  and  iii)  the  tracker  presented  here  where  FFT 
amplitudes  and  phases  in  all  the  cells  are  passed  to  the  tracker. 
For  convenience,  these  three  trackers  are  designated  HMM, 
HMM/A  and  HMM/AP  respectively. 


Fig.  la  shows  an  intensity  modulated  representation  of 
the  spectral  power  in  the  cells  of  the  gate  as  a  function  of  time. 
Frequency  increases  in  the  vertical  direction,  while  time 
increases  horizontally.  Each  cell  is  1  Hz  wide  in  the  frequency 
direction  and  1  Sec.  long  in  the  time  direction.  The  total  time 
window  spans  100  seconds  and  there  are  9  cells  in  the 
frequency  gate.  The  parameters  of  the  HMM  are  listed  in  the 
Figure  caption.  The  SNR  is  defined  as  A2/2o2.  The  signal 
consisted  of  a  frequency  modulated  tone  with  the  modulation 
having  the  form  of  a  sinusoid  of  amplitude  2  Hz  and  period  50 
seconds. 

The  measurement  sequence,  which  is  input  to  the  basic 
cracker,  is  displayed  in  Fig.  lb.  The  effect  of  the  noise  is  clear 
in  the  appearance  of  the  track.  In  Figs,  lc  -  le,  the  Viterbi 
tracks  arising  from  the  HMM,  HMM/A  and  HMM/AP  trackers 
are  displayed.  The  zero  state  is  shown  below  the  other  state 
cells.  An  improvement  to  the  quality  of  the  output  through 
Figs  lc  -  le  is  apparent,  as  the  amount  of  input  information 
presented  to  the  tracker  is  increased.  The  improved  quality 
manifests  itself  in  fewer  abrupt  terminations  and  re-initiations 
of  the  track  (i.e.,  entries  into  the  zero  state),  and  in  fewer  outli¬ 
ers. 

The  results  displayed  in  Fig.  1  are  typical.  However,  a 
quantitative  comparison  of  the  three  trackers  can  only  be  made 
by  comparing  their  average  performance  for  a  statistically 
significant  set  of  realisations  of  time  series  data.  The  results  of 
such  an  investigation  will  be  published  later. 

In  Fig.  2.  the  results  for  a  set  of  data  where  the 
frequency  modulation  of  the  signal  has  been  reduced  to  0.6  Hz 
amplitude  are  displayed.  All  other  details  are  the  same  as  for 
Fig.  1.  The  frequency  cells  in  the  intensity  modulated  display 
of  Fig.  2a  are  1  Hz  wide.  However,  for  the  displays  2b  and  2c, 
the  frequency  scale  has  been  expanded  by  the  factor  3.33  so 
that  the  frequency  cells  are  now  0.3  Hz  wide. 

Fig  2b  displays  the  results  of  the  PIE  frequency  estima¬ 
tion  routine.  The  general  trend  of  the  modulation  is  apparent  in 
the  expanded  display,  but  many  outliers  are  observed.  The 
frequency  estimates  that  were  outside  the  range  spanned  by  the 
state  cells  are  shown  in  the  zero  state. 

Figs.  2c  and  2d  show  the  Viterbi  tracks  and  MCO 
outputs  from  the  HMM/AP  tracker.  The  MCO  display  shows 
two  traces  that  are  respectively  one  standard  deviation  above 
and  one  standard  deviation  below  the  MCO  track.  The  GOP 
function  is  displayed  in  Fig.  2e  (ranging  from  values  of  0  to  1) 
and  indicates  a  high  probability  that  the  signal  was  present  at 
all  times. 

The  increased  accuracy  of  the  HMM/AP  tracker  (and 
the  PIE)  arises  from  the  exploitation  of  the  phase  information 
available  in  successive  FFTs.  This  information  was  not 
available  to  the  HMM  and  HMM/A  trackers  and  so  the  fine 
structure  in  the  frequency  modulation  could  not  be  detected  by 
these  trackers. 


5.  Conclusion 


The  HMM  frequency  line  tracker  presented  by  Streit 
and  Barrett  (1990)  has  been  extended  by  the  inclusion  of  FFT 
phase  information  into  the  tracker  input  measurement 
sequence.  As  a  result,  the  tracker  exhibits  an  improved 
performance,  and  is  able  to  accurately  track  smaller  frequency 
changes  than  was  previously  possible. 

References 

Barrett  R.F.  and  Streit  R.S.  (1989) 

Automatic  Detection  of  Frequency  Modulated  Spectral  Lines 
Proc  Australian  Symp.  on  Sig.  Proc.  and  Appls.,  Adelaide, 
pp283-287 

McMahon  D.R.A.  and  Barrett  R.F.  (1986) 

An  Efficient  Method  for  the  Estimation  of  the  Frequency  of  a 
Single  Tone  in  Noise  from  the  Phases  of  Discrete  Fourier 
Transforms 

Signal  Processing  Vol.  11,  ppl69-177 


Fig.l  Frequency  tracks  for  a  sinusoidally  modulated  (2  Hz 
amplitude)  tone  in  noise:  a)  intensity  modulated  spectrogram: 
b)  Measurement  sequence  for  basic  HMM  tracker,  c)  Viterbi 
■  output  for  basic  HMM  tracker,  d)  Viterbi  output  for  HMM/A 
tracker,  e)  Viterbi  output  for  HMM/ AP  tracker. 

(HMM  parameters  are  u  =  0.5,  v  =■  0.1,  d  *  0.3,  Thresh.  =  0.0, 
N  -  1024,  NT,  =  1  sec,  SNR  =  0.008  (-21  dB),  Cell  width  =  1 
Hz) 


McMahon  D.R.A.  and  Barrett  R.F.  (1987) 

Generalisation  of  the  Method  for  the  Estimation  of  the 
Frequencies  of  a  Tones  in  Noise  from  the  Phases  of  Discrete 
Fourier  Transforms 
Signal  Processing  Vol.  12,  pp371-383 

Steele  A.K.,  Streit  R.L.  and  Barrett  R.F.  (1989) 

Nonlinear  Frequency  Line  Tracking  Algorithms 

Proc  Australian  Symp.  on  Sig.  Proc  and  Appls.,  Adelaide, 

pp2S 8-262 

Streit  R.L.  and  Barrett  R.F.  (1990) 

Frequency  Line  Tracking  Using  Hidden  Markov  Models 
IEEE  Trans.  ASSP  V38,  pp586-598 


Fig.  2  Frequency  tracks  for  a  sinusoidally  modulated  (0.6  Hz 
amplitude)  tone  in  noise:  a)  intensity  modulated  spectrogram: 
b)  PIE  frequency  estimates;  c)  Viterbi  track  from  HMM/AP 
tracker;  d)  MCO  track  from  HMM/AP  tracker;  e)  GOP 
function  from  HMM/AP  tracker. 

(HMM  parameters  as  for  Fig.  1 ,  except  that  cell  width  is  0.3 
Hz) 


-206- 


Frequency  Line  Detector/Tracker 
Using  Hidden  Markov  Models 
With  Amplitude  Information 

R.  L.  Streit 


207- 


Abstract 


Four  different  methods  for  utilizing  amplitude  in  hidden 
Markov  model  (HMM)  detector/trackers  are  presented.  All  four  of 
the  HMM  detector/trackers  are  algorithmically  identical  in  their 
basic  structure.  The  only  differences  between  the  proposed 
trackers  are  confined  to  the  conditional  probability  density 
functions  (PDFs)  of  the  input  measurements.  The  fundamental 
HMM  algorithm  structure  is  presented  in  matrix  form,  and  the 
necessary  conditional  PDFs  for  the  four  detector/trackers  are 
derived. 


-209- 


TM.  911143 


Introduction 


The  papers  [1,  2]  are  the  first  papers  to  discuss  the  inclusion  of 
amplitude  information  into  detector/trackers  based  on  hidden  Markov 
models  (HMMs);  however,  these  papers  do  not  give  a  detailed  description 
of  the  way  in  which  amplitude  information  is  utilized.  Of  the  many 
ways  to  Include  amplitude  information  into  the  measurement  sequence  of 
the  HMM  detector/tracker,  four  are  described  here.  We  denote  these  four 
detector/trackers  by  HMM/AOO,  HMM/A01,  HMM/A10,  and  HMM/A11.  The 
notation  HMM/A* •  indicates  the  presence  or  absence  of  decision  and 
quantization  steps  (defined  below)  in  the  obvious  manner.  HMM/A11  is 
identical  to  the  detector/tracker  described  in  the  fundamental  HMM 
detector/tracker  paper  13],  and  HMM/A10  is  identical  to  the  best  of  the 
detector/trackers  discussed  in  papers  (1]  and  [2J.  Familiarity  with 
the  content  and  notation  of  [3]  is  assumed  here. 

An  Important  feature  of  detector/trackers  based  on  HMMs  is  that 
they  all  have  fundamentally  identical  algorithmic  structures. 
Different  input  measurements  change  the  measurement  likelihood 
function,  but  do  not  otherwise  affect  the  detector/tracker  algorithm. 
Because  of  this  feature,  we  describe  below  only  the  HMM 
forward-backward  algorithm  that  is  used  to  construct  the  probability 
field  from  which  the  continuous  outputs  of  the  HMM  detector-tracker  are 
derived.  The  discrete  output  is  the  Vlterbi  track,  and  it  is  computed 
by  a  dynamic  programming  algorithm.  The  minor  changes  required  to 
compute  the  Vlterbi  track  are  analogous  to  those  required  for  the 
continuous  outputs.  For  further  background,  see  [3]. 

The  detector/tracker  HMM/A11  uses  a  fixed  frequency  gate,  that  is. 
a  fixed  contiguous  subset  of  size  n  a  2  of  the  full  set  of  DFT 
(discrete  Fourier  transform)  frequency  cells.  The  DFT  size  is  N  t  n 

and  the  fixed  gate  cells  are  indexed,  for  convenience,  1,  2 . n. 

Amplitude  information  is  utilized  only  indirectly,  that  is,  the  input 
measurement  to  HMM/A 11  is  the  index  of  the  DFT  cell  having  the  largest 
amplitude,  provided  this  cell  exceeds  a  specified  threshold,  D,  and 

1 

-211- 


TM  911143 


lies  within  the  gate.  The  measurement  z  is  the  output  of  a  two  step 
measurement  process.  The  first  step,  called  the  decision  step,  chooses 
the  largest  amplitude  DFT  cell.  The  outputs  of  the  decision  step  are 
the  DFT  cell  index  i  a  1  and  its  amplitude  r^  The  second  step,  called 
the  quantization  step,  quantizes  the  amplitude  r^  to  one  bit.  The 
amplitude  threshold  determines  the  one  bit  quantization  breakpoint. 
The  output  of  the  quantization  step  is  the  DFT  cell  index  i  if  r.  2  D, 
and  the  Integer  0  if  r^  <  D.  The  measurement  process  of  the  HMM/A11 
detector/tracker  therefore  suppresses  (compresses)  available  amplitude 
data  in  the  each  step.  The  data  compression  steps  affect 
detector/tracker  performance  in  different  ways. 

The  distinction  between  HMM/AOO,  HMM/A01,  HMM/A10,  and  HMM/A11  is 
their  different  measurement  processes.  The  HMM/AOO  measurement  process 
eliminates  both  the  decision  and  quantization  steps.  Since  HMM/AOO 
utilizes  all  the  available  amplitude  information,  its  performance 
should  be  the  best  of  the  four.  The  HMM/A10  measurement  process 
eliminates  the  quantization  step,  but  not  the  decision  step,  of  the 
basic  HMM  measurement  process.  The  HMM/A01  measurement  process 
eliminates  the  decision  step  but  not  the  quantization  step.  As 
described  above,  the  HMM/A11  measurement  process  uses  both  decision  and 
quantization  steps. 

A  detector/tracker  HMM  must  have  two  types  of  states  —  those 
corresponding  to  "signal  absent”  hypotheses  and  those  corresponding  to 
“signal  present"  hypotheses.  If  either  type  of  state  is  not  used, 
then  the  HMM  cannot  be  described  as  a  detector/tracker.  In  the 
frequency  line  tracking  application  utilizing  fixed  gates,  a  signal  can 
be  absent  because  It  has  faded  or  because  it  has  exited  either  to  the 
right  or  to  the  left  of  the  fixed  gate.  The  signal  can  be  present  in 
more  than  one  way  also,  i.e.,  the  signal  can  lie  in  any  one  of  the  DFT 
cells  In  the  gate.  For  this  discussion,  we  use  n  “signal  present" 
states  and  one  "signal  absent"  state.  The  "signal  absent"  state  is 
Indexed  0  and  called  the  zero  state.  The  n  "signal  present"  states 
are  numbered  from  1  to  n  and  correspond  to  signal  frequency  lines 
centered  in  the  n  DFT  cells  of  the  gate. 

2 


-212- 


TM.  911143 


HMM  Detector/Tracker  Algorithm 

Let  Z  =  (Zj,  ....  zT)  denote  a  measurement  sequence  of  length  T  £ 
1  that  is  Input  to  am  n+1  state  HMM  detector/tracker.  The  forward 
algorithm  computes  the  vectors  e  Rn+1  by  means  of  an  algorithm  that 
is  easily  stated  in  terms  of  matrix  products.  Let  '  denote  matrix 
transposition.  Then  the  forward  algorithm  is  defined  by 


OCj  =  B(Zj )  ir 


(1) 


“t+l  =  B(zt+1}  A  V  1  =  1 . T-1, 

where  n  6  Rn+1  is  the  initial  state  probability  vector,  A  =  la^.]  € 
R(n+l)x(n+l)  is  t^e  state  trainsition  probability  matrix,  and,  for 
au-bitrary  measurements  z,  the  matrix  B(z)  e  p^n+1^x^n+1^  defined  by 

B(z)  =  Diag  [b  (z),  b  (z) . b  (z) ] , 

u  l  n 


and  where  b^ (z)  denotes  the  likelihood  function  of  the  measurement  z 
conditioned  on  the  signal  state  i,  1=0,  1,  ....  n.  Similarly,  the 
backward  algorithm  computes  the  vectors  0^  c  Rn+1  by  means  of  an 
algorithm  that  is  equivalent  to  the  matrix  product 


0T  =  (1  1  •••  1  )'  €  Rn+1 

0t  =  A  B(zt+J)  Bt+1>  t  *  T-l,  ...  ,  1. 


(2) 


Probabilistic  interpretations  for  the  vectors  arnd  0t  are  given  in 
(3).  The  continuous  detector/tracker  outputs  are  derived  (see  [3]) 
from  the  probability  field,  denoted  by  F(t,i).  The  numerical  value  of 
F(t,  i )  is  the  probability  the  signal  occupies  state  i  at  time  t, 
conditioned  on  all  the  available  measurements  Z.  It  is  defined  in 
terms  of  the  components  < ( i ) >  and  <0^(1 )>  of  and  0^  by 


F(t.l) 


at(i)  0t(i) 


•  n 

£  a^(k)  0t ( k) 

■  k-0 


(3) 


3 


-213- 


TM  911143 


In  equation  (3),  t  ranges  from  1  to  T,  and  i  ranges  from  0  to  n.  An 
arbitrary  nonzero  scale  factor  ictcan  be  applied  to  the  diagonal  matrix 
B(zt)  without  altering  the  probability  field  F(t,i)  because  such  scale 
factors  cancel  out  in  the  definition  (3).  The  Judicious  use  of  scale 
factors  can  result  in  reduced  complexity  in  the  required  likelihood 
functions,  and  give  greater  insight  into  the  underlying  theoretical 
structure. 

From  equations  ( 1 )— (3) ,  it  is  clear  that  the  different  measurement 
characteristics  affect  the  detector/tracker  output  only  via  the  state 
likelihood  functions  b^(z).  The  algorithms  are  otherwise  blind  to  the 
measurement.  The  likelihood  functions  bjfz)  for  each  of  the  four 
amplitude  measurement  processes  are  derived  below.  Two  probability 
density  functions  (PDFs)  of  the  measured  amplitude  r  in  a  DFT  cell  are 
required.  The  PDF  of  r  conditioned  on  the  simple  hypothesis  "signal 
in  state  i  *  0"  is  given  by 

P^r)  =  (2rN/«r2)  I0(ArN/«r2)  expl-  h(4r2+A2)/(4«r2)],  (4) 

and  the  PDF  of  r  conditioned  on  the  simple  hypothesis  "signal  in  state 
1  *  0"  is  given  by 

P2(r)  ■  (2rN/e-2)  expl-  Nr2/<r2] ,  (5) 

where  Iq(*)  denotes  the  modified  Bessel  function  and  N  is  the  size  of 

the  DFT.  The  derivation  of  these  PDFs  assumes  zero  mean  white  Gaussian 
2 

noise  of  variance  «r  and  a  signal  frequency  line  of  amplitude  A  that  is 
centered  in  the  DFT  cell.  It  is  necessary  to  integrate  these  functions 
in  certain  of  the  HMM  detector/t rackers.  Note  that  the  function  P2(r) 
is  easily  lntegrable  in  closed  form,  but  that  Pj(r)  is  not.  Further 
details  are  given  in  (3). 


4 


-214- 


TM.  911143 


Description  of  HMM/AOO 


This  measurement  process  does  not  use  either  decision  or 

quantization  steps.  Therefore,  no  quantization  threshold  is  used, 

i.e. ,  D  =  0.  The  measurement  available  to  the  detector/tracker  at  any 

given  time  t  takes  the  form  z  =  (r, ,  r_ . r  ),  where  r,  is  the 

l  2  n  l 

measured  amplitude  of  the  i-th  DFT  cell  in  the  tracking  gate.  There 
are  two  cases.  If  the  signal  state  is  the  zero  state,  no  signal  is 
present  in  any  of  the  n  DFT  cells  of  the  gate.  Since  the  measured 
amplitudes  r^  in  each  cell  are  independent,  we  have 

n 

bn(z)  =  k  II  P_(r  ) ,  (6) 

J=1  J 


where  k  is  a  nonzero  state  independent  scale  constant  to  be  determined. 
Similarly,  if  the  signal  state  is  i  a  1,  then  a  frequency  line  lies  in 
the  center  of  the  i-th  DFT  cell.  Because  of  the  center  cell 
assumption,  the  amplitude  measurements  are  independent  from  cell  to 
cell,  so  that 


b^(z) 


K 


II  P5(r.) 
J*1  J 


w* 


(7) 


We  now  choose  *c  so  that  bQ(z)  =  1,  that  is,  (c  is  the  reciprocal  of  the 
product  on  the  right  hand  side  of  equation  (6).  Therefore, 


b^z)  = 


1. 


w 

w 


i  =  0 


i  a  1. 


(8) 


It  is  interesting  to  note  that  the  function  b^(z)  is  a  most  powerful 
hypothesis  test  on  measured  amplitude  r^  for  "signal  in  state  i  a  0"  vs 
"signal  in  state  i  =0". 


5 


-215- 


TM  911143 


Description  of  HMM/A01 

This  measurement  process  does  not  use  a  decision  step,  but  does 
use  a  quantization  step.  A  threshold  D  >  0  defines  the  one  bit 
quantization  breakpoint.  (The  case  D  =  0  is  clearly  not  useful  in  this 
application.)  The  measurement  input  to  the  detector/tracker  at  any 
given  time  t  takes  the  form  of  a  set: 

z  =  {DFT  gate  cells  whose  amplitudes  equal  or  exceed  D  >. 

The  measurement  set  z  identifies  indirectly  those  gate  cells  whose 
amplitudes  do  not  exceed  D.  No  other  information  about  cell  amplitude 
values  is  contained  in  z.  If  the  measurement  set  z  is  empty,  it 
contains  the  useful  information  that  no  cell  in  the  gate  exceeds  D. 


There  are  two  cases.  If  the  signal  state  is  the  zero  state,  no 
signal  is  present  and,  because  of  the  cell  amplitude  Independence 
assumptions, 


bQ(z)  =  K 


[£ 


P2(r)  dr 


JO 

•  ''O 


P2(r)  dr 


n  -  #(z) 


where  #(z)  is  the  number  of  measurements  in  the  set  z.  If  z  is  empty, 
then  #(z)  =  0.  The  Integral  from  D  to  «  is  the  probability  that  the 
amplitude  in  a  cell  exceeds  the  threshold  D,  while  the  integral  from  0 
to  D  is  the  probability  that  the  amplitude  is  less  than  D.  Suppose  the 
signal  state  is  i  fc  1.  If  i  t  z,  then 


bj(z)  =  ic  Pgfr)  dr 


#(z)  -  1 


•  00  JD 

Pj(r)  dr  1 

■  JD  J0 


P2(r)  dr 


n  -  #(z) 


Alternatively,  in  the  "missed  detection"  case,  1  t  z  and 


bj(z)  ■  k  P2(r)  dr 


P2(r)  dr 


n-#(z)-l 


J) 

P1 
•  Jo 


(r)  dr 


Setting  k  so  that  bQ(z)  *  1  gives 


216- 


TM.  911143 


1. 

j 

Pi(r)  dr 

b  (z)  ■ 

1 

j 

IS  P2(r)  dr 

j 

[d  pi(r)  dr 

] 

Id  p2(r)  dr 

The  function 

b  (z)  is  a  most 

if  i  =  0 


If  1  <  z 


if  i  e  z. 


(9) 


measurements  types,  namely 

(1)  ambiguous  detection  (that  is,  i  e  z),  and 

(2)  missed  detection  (that  is,  i  t  z), 

for  "signal  in  state  1  it  0"  vs  "signal  in  state  i  =  0". 


-217- 


TM  911143 


Description  of  HMM/A10 

This  measurement  process  uses  a  decision  step,  but  not  a 
quantization  step.  Because  there  is  no  quantization  step,  the 
threshold  D  Is  set  to  0.  (N.B.  The  paper  [1]  considers  the  general 
case  D  a  0  and  concludes,  on  the  basis  of  simulation,  that  D  *  0  Is 
optimal  for  detection  performance.  )  The  measurement  available  to  the 
detector/tracker  at  any  given  time  t  takes  the  form 

z  =  {DFT  gate  cell  kal  has  amplitude  r^,  and  all  gate  amplitudes  s  r^}. 

The  measurement  z  contains  the  DFT  cell  index  k,  its  amplitude  r.  ,  and 
the  information  that  no  other  cell  in  the  gate  has  greater  amplitude. 
If  the  signal  state  is  zero,  then 


b0(z) 


=  K 


P2(r)  dr 


L  J0 


n-1 


P2(rk). 


If  the  signal  state  is  i  fc  1,  and  i  *  k  (the  missed  detection  case), 
then 


■ 

fk  1 

n-2 

fk 

P2(r)  dr 

Pj(r)  dr 

. 

0 

-  J 

0 

If  i  ®  k,  however,  then 


b^z) 


f 


P2(r)  dr 


n-1 


w- 


Setting  the  scale  factor  k  so  that  b^(z)  *  1  gives  the  result 

8 


-218- 


TM.  911143 


bjtz) 


1  . 


fr  Pl(r)  dr 
J£k  P2(r)  dr 


If  1  =  0 

If  i  =  k 


If  i  *  k. 


The  function  b^z)  is  a  most  powerful  hypothesis  test 
measurements  types,  namely 

(1)  correct  detection  (that  is,  i  =  k)  with  amplitude  r^  and 

(2)  missed  detection  (that  is,  i  *  k)  with  amplitude  r^, 
for  “signal  in  state  1  a  0"  vs  “signal  in  state  i  =0”. 


9 


(10) 


on  the 


-219- 


TM  911143 


Description  of  HMM/A1 1 


This  measurement  process  uses  both  a  decision  and  a  quantization 
step.  A  quantization  threshold  D  >  0  determines  the  one  bit 
quantization  breakpoint.  The  measurement  available  to  the 
detector/tracker  at  any  given  time  t  is  one  of  the  following: 


Zq  -  <no  DFT  gate  cell  has  amplitude  exceeding  D> 

z^  ■  {DFT  gate  cell  k  has  maximum  amplitude  and  its  amplitude  a  D}, 


where  k  *  1,  2,  ...  ,  n.  The  measurements  zQ  and  z^  contain  no 
information  about  the  maximum  amplitude  other  than  the  fact  that  it 
does  or  does  not  exceed  the  threshold  D.  Note  that  the  measurement  zQ 
contains  no  information  on  the  location  of  the  gate  cell  of  largest 
amplitude.  If  the  signal  state  is  zero,  then  for  measurement  zQ 


bo(zo> 


r 

r 

K 

P2(r)  dr 

- 

0 

and,  for  measurements  z^,  k  a  1, 


w 


=  ic 


r" 

* 

pT 

P2(r) 

P 2(p)  dp 

•*D 

.  * 

0 

n-1 


dr 


*  -  w 


The  Identity  follows  from  a  normalization  argument  (i.e.,  the  sum  over 
all  k  a  0  of  b^tz^)  equals  »c)  and  the  fact  that  b^Cz^)  is  constant  for 
k  a  1.  If  the  signal  state  is  i  *  0,  then  for  measurement  zQ 


w 


■ 

J>  i 

• 

P^r)  dr 

P2(r)  dr 

. 

0 

0 

10 


-220- 


TM.  911143 


If  the  measurement  is  z^,  then 


bjIZj)  =  K 


r" 

r  f 

Pl(r) 

P2(p)  dp 

•*D 

■  ^0 

■\n-l 


dr. 


The  function  P,,(r)  *s  eas^V  integrated,  so 


bjiZj)  *  it 


f“ 

P,(r)  1  - 

J  n 


2  2 

exp(-  r  H/cr)  dr. 


Thus,  evaluating  bj(Zj)  requires  numerical  integration  of  a  one 
dimensional  integral.  Finally,  if  the  measurement  is  z^,  k  £  {0,  i>, 
then 


|  P2(r. 

x  1 

Pj(p)  dp 

» 

T 

P 2(p)  dp 

0 

0 

Normalization  arguments  give  the  equivalent  expression 


w  = 


K  -  ^(Zq)  -  b^Zj) 

n  —  1 


The  expressions  above,  with  k  *  1,  were  first  given  in  [3],  and  are 
easy  to  utilize  in  practice. 

It  is  worthwhile  rewriting  the  above  equations  in  a  form  that 
exhibits  their  theoretical  character.  To  this  end,  for  each  positive 

integer  n,  we  define  the  random  variable  Dn  *  max  {R^ .  Rn>,  where 

the  random  variables  R^  are  independent  and  Identically  distributed, 
with  common  PDF  given  by  P_(r).  Conditioned  on  the  event  Q_  fc  D,  the 
PDF  of  Qn  is  given  by 


11 


-221- 


TM  911143 


P2(p)  dp 

n-1  ✓ 

00 

P2(r) 

* 

P2(p)  dp 

0 

D 

.  ■ 

0 

where  the  denominator  is  the  conditioning  term  required  to  make  Q  (r)  a 

n 

valid  PDF.  By  substituting  the  definition  (5)  of  P^Cr)  3111(1  performing 

the  required  Integrations,  an  explicit  formula  for  Q  (r)  can  be 

n 

obtained,  if  desired.  We  now  choose  the  scale  factor  k  so  that  bQ(2^) 
=  1  for  all  k  a  0.  Utilizing  the  functions  r* )  gives  the  result 


bjfz) 


1. 


for  i  =  0  and  all  z 


JS  Pl(r)  dr 

JS  p2(r)  dr  ’ 


for  i  *  0  and  z  =  zQ 


(11) 

for  i  *  0  and  z  =  z^ 


ryp1  dP 
Jo  p2(p) 


Qn(r)  dr, 


for  1*0  and  z  *  z^. 


The  function  b^(z)  Is  a  most  powerful  hypothesis  test  on  the 
measurements  {z^}  for  the  "signal  in  state  1  a  0"  vs  "signal  in  state  1 
=  0”.  Note  that  the  last  two  expressions  in  (11)  are  expectations  of 
likelihood  ratios. 


12 


-222- 


TM.  911143 


Concluding  Remarks 


The  four  detector/trackers  presented  here  have  a  common  HMM 
algorithmic  structure  that  Is  easily  implemented.  The  only 
computational  difference  between  them  lies  in  the  calculation  of  the 
state  conditional  PDFs  of  the  available  measurements.  In  all  four 
detector/trackers,  however,  these  PDFs  can  be  precomputed  either  as  a 
list  of  all  possible  measurement  outcome  probabilities,  or  as  a 
likelihood  function  lookup  table  with,  say,  spline  Interpolation.  With 
sufficient  attention  to  the  PDF  calculation  details,  all  four 
detector/trackers  should  run  at  approximately  the  same  speeds. 

HMM/AOO  should  be  the  best  of  the  detector/trackers  since  it  makes 
use  of  all  the  available  amplitude  data.  HMM/A10  and  HMM/A11  can  be 
used  for  tracking  on  spectrogram  displays  that  compress  the  measured 
amplitude  data  in  ways  that  are  compatable  with  their  measurement 
processes.  Of  these  two,  HMM/A10  should  give  better  performance  than 
HMM/A11  because  HMM/A10  has  more  amplitude  information  available  than 
HMM/A11.  Finally,  HMM/ AO 1  may  be  useful  in  applications  in  which  the 
true  signal  is  mismatched  to  the  signal  model  used  here.  This 
possibility  deserves  further  study. 


13 


-223- 


TM  911143 


References 


1.  R.  F.  Barrett  and  R.  L.  Strelt,  "Automatic  Detection  of  Frequency 

Modulated  Spectral  Lines,"  Proceedings  of  the  Australian  Symposium  on 
Signal  Processing  and  Its  Applications,  Adelaide,  Australia,  17-19 

April  1989. 

2.  A.  K.  Steele,  R.  L.  Strelt,  and  R.F.  Barrett,  "Nonlinear  Frequency 

Line  Tracking  Algorithms,"  Proceedings  of  the  Australian  Symposium  on 
Signal  Processing  and  Its  Applications,  Adelaide,  Australia,  17-19 

April  1989. 

3.  R.  L.  Strelt  and  R.  F.  Barrett,  "Frequency  Line  Tracking  Using 
Hidden  Markov  Models,"  IEEE  Transactions  on  Acoustics,  Speech,  and 
Signal  Processing,  ASSP-38  (1990),  586-598. 


14 


-224- 


Estimation  of  Signal  Amplitude 
And  Background  Noise  Power 
In  Hidden  Markov  Model 
Detector/Trackers 


R.  L.  Streit 


-225- 


Abstract 


Frequency  line  detector/trackers  based  on  hidden  Markov 
models  (HMMs)  are  designed  for  optimum  detection  and  tracking 
performance  at  a  specified  design  signal-to-noise  ratio  (SNR).  In 
practice,  their  performance  is  observed  to  be  robust  to  mismatch 
between  the  design  SNR  and  the  true  SNR,  especially  when  the 
true  SNR  exceeds  the  design  SNR.  A  natural  way  to  improve 
performance  further  is  to  estimate  true  SNR  in  an  attempt  to 
match  the  design  SNR  of  the  HMM  detector/tracker  to  the  true 
SNR.  This  memorandum  derives  maximum  likelihood  estimates  of 
signal  and  background  noise  power  for  a  specific  HMM 
detector/tracker  (known  as  HMM/ AO 0),  using  the  Baum-Welch 
reestimation,  or  EM,  method.  The  estimates  are  derived  by 
exploiting  the  intrinsic  training  capabilities  of  general  HMMs,  so 
the  approach  of  this  memorandum  is  not  limited  to  the  one  HMM 
detector/tracker  presented  here.  Different  HMM  detector/ 
trackers  will  generally  yield  different  estimators  for  signal 
amplitude  and  noise  power. 


-227- 


TM.  No.  911189 


INTRODUCTION 


The  papers  [1,  2]  are  the  first  papers  to  discuss  the  inclusion  of 
amplitude  information  into  detector/trackers  based  on  hidden  Markov 
models  (HMMs),  but  neither  paper  discusses  the  estimation  of  signal  and 
noise  powers.  This  memorandum  gives  a  derivation  of  maximum  likelihood 
estimates  of  signal  amplitude  and  background  noise  power  for  the  best 
of  the  four  HMM  detector/trackers  described  in  [3],  namely,  HMM/AOO. 

Maximum  likelihood  estimates  of  signal  amplitude  and  backg^ynd 
noise  power  are  derived  from  the  likelihood  structure  imposed  on  the 
measured  data  by  the  detector/tracker  HMM/AOO.  The  likelihood 
structure  of  HMM/AOO  is  highly  non-Gauss i an  in  nature  because  it  arises 
from  the  “hidden"  Markov  chain  and  the  specialized  measurement 
likelihood  functions  appropriate  to  the  frequency  line  tracking 
application.  The  methods  used  here  for  HMM/AOO  can,  in  principle,  be 
applied  to  derive  signal  and  noise  power  estimates  for  the  other  HMM 
detector/trackers  presented  in  [31;  however,  estimators  for  these  other 
trackers  are  not  presented  here. 

The  detector/tracker  HMM/AOO  uses  a  fixed  frequency  gate,  that  is, 
a  fixed  contiguous  subset  of  size  n  a  1  of  the  full  set  of  DFT 
(discrete  Fourier  transform)  frequency  cells.  The  DFT  size  is  N  a  n 

and  cells  of  the  fixed  gate  are  indexed,  for  convenience,  1,  2 . 

n.  The  HMM/AOO  measurement  process  utilizes  all  the  available  amplitude 
information,  that  is,  the  output  of  the  measurement  process  at  time  t 
is  the  vector 


Zt  =  (rlt’  r2t' 


rnt>’ 


(1) 


where  r ^  is  the  measured  DFT  amplitude  in  gate  cell  1  at  time  t. 

The  detector/tracker  HMM/AOO  has  two  types  of  states  —  one 
corresponding  to  the  "signal  absent"  hypothesis  and  others 


1 


-229- 


TM.  No.  911189 


corresponding  to  "signal  present"  hypotheses.  (If  either  type  of  state 
Is  not  used,  then  the  HMM  cannot  be  described  as  a  detector/tracker. ) 
The  “signal  absent"  hypothesis  is  Indexed  0  and  called  the  zero  state. 
The  "signal  present"  hypotheses  are  numbered  from  1  to  n  and  correspond 
to  signal  frequency  lines  centered  In  the  n  DFT  cells  of  the  gate. 


TM.  No.  911189 


BACKGROUND  ON  THE  DETECTOR/TRACKER  HMM/AOO 


Let  Z  =  (z^,  Zg . z^)  denote  a  measurement  sequence  of  length 

T  a  1  that  is  input  to  the  n+1  state  detector/tracker  HMM/AOO,  where 
the  measurements  z^  are  defined  by  (1).  The  forward  algorithm  computes 
the  vectors  e  Rn+*  by  means  of  sin  algorithm  that  is  easily  stated  in 
terms  of  matrix  products.  Let  '  denote  matrix  transposition.  Then  the 
forward  algorithm  is  defined  by 


otj  =  B(z^)  jt 


(2) 


t+1 


B(zt+1^  k  at’ 


t  =  1. 


,  T-l. 


where  n  €  IRn  1  is  the  initial  state  probability  vector,  A  =  [a^]  e 
R(n  l)x(n  1)  state  transition  probability  matrix,  and,  for  an 
arbitrary  measurement  z,  the  matrix  B(z)  e  IR^n+1 )*(n+1 )  is  defined  by 


B(z)  =  Diag  [bQ(z).  b^z),  ...  ,  bn ( z ) ] , 


and  where  b^Cz)  denotes  the  likelihood  function  of  the  measurement  z 

conditioned  on  the  signal  state  i,  i  =  0,  1 . n.  Similarly,  the 

backward  algorithm  computes  the  vectors  3t  e  Rn+*  by  means  of  an 
algorithm  that  is  equivalent  to  the  matrix  product 


PT  =  (1  1  •••  1  )'  €  Rn+1 

=  A  B(zfc+1)  3t+1,  t  =  T-l,  ...  ,  1. 


(3) 


Probabilistic  interpretations  for  the  vectors  and  are  given  in 
[4].  The  continuous  detector/tracker  outputs  are  derived  (see  [4]  for 
further  details)  from  the  probability  field,  denoted  by  7^(1).  The 
numerical  value  of  7tU)  is  the  probability  the  signal  occupies  state  i 
at  time  t,  conditioned  on  all  the  available  measurement  sequence  Z.  It 
is  defined  in  terms  of  the  components  (at(i)>  and  ( i ) >  of  the 
vectors  and  3t  by 


3 


-231- 


TM.  No.  911189 


rt(i)  =  «t(l)  Pt(i) 


l  at(k!  St(k) 

k=0 


(4) 


In  equation  (4),  t  ranges  from  1  to  T,  and  i  ranges  from  0  to  n. 
Although  it  is  not  immediately  clear,  it  can  be  shown  that 

n 

E  't(l>  -  1 

1=0 


for  each  time  t.  The  name  "probability  field"  for  y^d)  was  first  used 
in  [7],  where  its  image  enhancement  property  was  first  described. 

A  statistic  closely  related  to  the  probability  field  is  the  "gate 
occupancy  probability",  denoted  Gq  .  The  G  is  defined  by 

T  T  n 

coP  -  T  E  <>  -  V°ll  ’  f  E  E  (sl 

t=l  t  =  l  1=1 


The  zero  state  is  excluded  from  the  sums  (5),  so  the  statistic  G  is 

op 

conditioned  on  signal  presence  somewhere  within  the  gate.  An  important 
interpretation  of  the  GQp  is  that  it  represents  the  fraction  of  time 
for  which  a  signal  is  present  within  the  gate.  The  G  is  potentially 
useful  as  a  detection  statistic. 

From  equations  (2)— (4) ,  it  is  clear  that  measurements  affect  the 

detector/tracker  output  only  via  the  state  likelihood  functions  b^(z). 

HMM/A00  is  otherwise  blind  to  the  measurement.  The  likelihood 

functions  b^(z)  are  derived  in  terms  of  the  following  two  probability 

density  functions  (PDFs)  of  the  measured  amplitude  r  in  a  DFT  cell. 

2  2 

Define  the  noise  parameter  p  =  l/<r  ,  where  <r  is  the  power  (variance) 
of  the  zero  mean  white  Gaussian  background  noise.  The  PDF  of  r 
conditioned  on  the  hypothesis  "signal  in  state  1  *  0"  is  the  Rician  PDF 
defined  by 


TM.  No.  911189 


P^rlA.p)  =  (2rNp)  IQ(ArNp)  exp(-  Np(r2  +  A2/4)],  (6) 

where  I^(*)  denotes  the  modified  Bessel  function  of  order  v,  and  N  is 
the  size  of  the  DFT.  The  derivation  of  this  PDF  assumes  the  signal 
frequency  line  of  amplitude  A  is  centered  in  the  DFT  cell.  The  PDF  of 
r  conditioned  on  the  hypothesis  “signal  in  state  i  =  0"  is  the  Rayleigh 
PDF  defined  by 

P2(r|p)  =  (2rNp)  exp[-  Nr2p].  (7) 

Note  that  P2(r|p)  *  P^CrlO.p).  Plots  of  P^(r|A,p)  for  several  values 
of  A  for  fixed  noise  power  Np  =  1  are  given  in  Figure  1. 

As  is  shown  in  [3],  for  the  measurement  z ,  the  state  conditional 
likelihood  functions  take  the  form 


Kt(P) 

W  “  b1(ztlA.p)  “  •  p^r^iA.p) 

P2(rit|A.P) 


Kt(p)* 


where  the  "scaling"  function  *t(p)  is  given  by 


n 

K  (p)  =  n  P  (r  |A,p). 

t  k=1  kt 


(9) 


The  expressions  (8)  and  (9)  can  be  simplified  if  desired  by  substituting 
the  equations  (6)  and  (7). 

Note  that  the  symbol  A  has  been  used  to  denote  both  the  Markov 
chain  transition  probability  matrix  and  also  the  signal  amplitude. 
Both  uses  of  the  symbol  A  are  somewhat  conventional,  and  it  is 
inconvenient  to  change  either.  This  abuse  of  notation  should  cause  no 
confusion. 


5 


-233- 


TM.  No.  911189 


DERIVATION  OF  MAXIMUM  LIKELIHOOD  ESTIMATION  EQUATIONS 


2 

Estimates  of  signal  amplitude  A  and  background  noise  power  <r  are 
derived  using  the  EM  (Expectation-Maximization)  method  [5].  When  the 
EM  method  is  applied  to  HMM' s,  the  EM  method  is  more  commonly  known  as 
the  Baum-Welch  reestimation  algorithm  (6).  The  derivation  in  this 
Appendix  is  a  variant  of  Baum-Welch  that  accomodates  continuously 
variable  parameters  and  the  specialized  likelihood  structure  of  the 
frequency  line  estimation  problem.  The  discussion  here  assumes 

familiarity  with  the  fundamental  HMM  detector/tracker  concepts  as 
described  in  [4]  and  with  the  definition  of  HMM/AOO  as  described  in 
[3],  To  the  extent  possible,  the  notation  of  [4]  is  used  here. 

For  times  t  =  1,  2,  ...  ,  T,  the  measurement  zfc  e  Z  is  given  by 
(1).  The  "missing  data"  in  the  sense  of  the  EM  method  is  the  state 
sequence  I  of  the  Markov  chain.  Let 

I  =  (1(1),  1(2) . t(T)}  €  f, 

where  the  index  i(t)  denotes  the  true  signal  state  at  time  t,  and  5 

denotes  the  set  of  all  possible  state  sequences.  (N.  B.  We  use  the 

notation  i(t)  instead  of  the  more  common  ifc  to  avoid  the  use  of 

subscripted  subscripts  in  the  sequel.)  The  n+1  Markov  chain  states  are 

numbered  consecutively  from  0  to  n,  so  0  a  l(t)  s  n  for  all  time  t. 

Let  Z'  =  Z  u  I.  The  PDF  of  Z' ,  conditioned  on  the  detector/tracker 

HMM/AOO,  is  parameterized  by  signal  amplitude  A  and  white  Gaussian 

2 

background  noise  parameter  p  =  l/<r  and  is  given  by 


?(Z'  I  A,  p)  =  it 


i(l) 


T 

n 

t=2 


li(t-l), i(t)  bI(t)(zt|A,p)' 


(10) 


The  transition  probability  matrix  A  =  (a,  ,1  €  R^n  ^  and  the 

1 J  n+1 

initial  state  probability  vector  n  =  (w^)  €  R  define  the  Markov 


6 


-234- 


TM.  No.  911189 


chain,  and  are  specified  a  priori  by  the  detector/tracker  HMM/AOO, 
while  the  likelihood  function  b^Cz^lA.p)  Is  given  explicitly  by 
equations  (8)-(9)  above.  As  required  by  the  EM  method,  PDF  induced  on 
the  set  f  of  all  state  sequences  is  derived  from  Bayes  Theorem  and  is 
given  by 


X(I|Z,A,p) 


y(Z' | A,p) 

'  ITzTa ,p)  ' 


(ID 


where,  from  the  HMM  likelihood  structure. 


2(Zj  A, p) 


For  future  use,  note  that 


I 


l€* 


?(Z' |A,p). 


i 


X  ( 1 1 Z , A,p) 


l€i 

£  X(I|Z,A,p)  = 
IeSMft) 


1 


rtU(t)). 


(12) 


(13) 


(14) 


where  rt  *  rt(A,p)  e  Rn+1  is  the  likelihood  vector  computed  from 
equation  (4).  In  equation  (14),  the  notation  l€?\i(t)  means  that  the 
sum  is  over  all  state  indices  I  except  i(t).  The  proof  of  (13)  is 
obvious,  while  the  proof  of  (14)  requires  examining  the  probabilistic 
interpretations  of  the  forward  and  backward  likelihood  vectors,  denoted 
by  a  €  Rn+1  and  3  e  Rn+\  respectively,  and  computed  from  equations 

v  W 

(2)-(3) .  The  dependence  of  and  on  the  parameters  A  and  p  is 
implicit  in  the  notation.  The  result  (14)  is  essentially  a  corollary 
of  Baum’s  Theorem  stating  that,  for  all  t. 


n 

2(Z|  A,p)  =  V  at(i(t))3t(i(t)).  (15) 

KtT=o 

Further  details  of  the  proof  of  (14)  are  omitted. 


7 


-235- 


TM.  No.  911189 


The  expectation  step  of  the  EM  method  is  defined  by 


Q(A,p|A',p')  a  E[  log?(Z'|A,p)  |  Z,A',p'  ] 


X(I|Z,A',p')  log  f(Z' |Z,A,p).  (16) 


In  (16),  the  parameters  A'  and  p'  are  assumed  given,  and  the  goal  is  to 
maximize  Q  by  choice  of  of  A  and  p.  Taking  the  logarithm  of  7(Z' |A,p) 
in  (10)  and  substituting  into  (16)  gives 

T 

Q(A,p|A',p')  =  c  +  ^  X(IlZ.A'.p')  log  bi(t)(2tiA,p),  (17) 

Ie*  t^l 

where  c  denotes  a  constant  independent  of  the  parameters  A  and  p. 
Isolating  the  sum  over  l(t)  in  equation  (16),  and  using  the  result 
(13),  gives 

T  n  _ 

Q(A.p|A\p')  =  c  +  £  £  log  b1(t)(zt|A,p)  £  X(IIZ.A'.p') 


t=l  i(t)=0 
T  n 


Ie*\i(t) 


I  I  7^(1)  log  b1(zt.|A,p), 


t=l  1=0 


where  7^  *  7t(A',p' )  €  R  Is  the  state  occupancy  likelihood  vector 
defined  by  equation  (4)  for  the  given  parameters  A'  and  p' . 
Substituting  expressions  (8)-(9)  into  equation  (18)  gives  the  function 
Q(A,p|A',p')  as  an  explicit  function  of  the  parameters  A  and  p. 


A  stationary  point  of  Q(A,plA' ,p')  with  respect  to  independent 
variables  A  and  p  satisfies  the  equations 

T  n 

is  -  Z  l  n(1>  Is  >»«  yzti*.<>>  -  0  ns) 

t=l  1=0 
T  n 

5  ■  IE  n(n  h  1o«wa'<”  ■  »•  '2o> 

t=i  1=0 


8 


-236- 


TM.  No.  911189 


The  required  partial  derivatives  in  ( 19)— (20)  can  be  computed  from 
(8)-(9).  Using  the  fact  that  !'(•)  ■  IjM,  the  required  partial 


derivatives  are 


3A  log  Kt(p)  =  0> 


log  K 


t(p)  -  jf  [  •  -  *  »  r|  ]  . 


a  ,  fpi(zt|A,p)  ,  ,]  Np  „  Ii(Arit> 

log  -  K  (p)  *  2  r  - 

P2(ztlA,p)  2  io(Ariti 

•  « 


P  (z  |A,p) 

a  log  J_t -  (p) 

dp  P2(zt|A,p) 


where 


I  y  r2 
n  L  rlt 


For  later  use,  we  define 


Ar,.Np 


I1(AritNp) 

Io(AritNp) 


-  nNp  r: 


T1  l  i  ■  FT  l  l  *?r 


t-1  n-1 


Substituting  these  partial  derivatives  into  the  first  necessary 
condition  (19)  and  simplifying  gives  the  nonlinear  equation 


T  n 


2  I  l  rit  .  , 

•  .  •  *  a  ^  n ' 


I^APltHPJ 


t*l  i»l 


o(AritNp) 


-237- 


TM.  No.  911189 


where  the  coefficients  j  are  defined  by 


it 


rt(l|A',p') 

T  G' 
op 


(23) 


In  equation  (22),  G'  ■  G  (A'  ,p' )  denotes  the  gate  occupancy 

op  op 

probability  defined  by  equation  (5)  for  A'  and  p' .  The  definitions 
(23)  and  (5)  imply  that  0  and  that 


T  n 


l  l 


r, 


it 


1. 


(24) 


The  coefficients  {0'  }  are  Independent  of  the  unknown  parameters  A  and 
p.  Equation  (22)  is  one  of  the  two  coupled  estimation  equations.  The 
other  equation  is  derived  by  substituting  the  necessary  partial 
derivatives  into  (20)  and  simplifying  terms.  This  gives 


tr 

N 


1 

N  p 


r2  - 


WA''P'> 


A_ 

4 


(25) 


Equation  (25)  is  the  second  estimation  equation.  Substituting  (25) 
into  (22)  gives  a  single  nonlinear  equation  in  the  signal  amplitude  A, 
namely. 


A 


T 


*  l 


n 


I 


It 


I, (At,.  /  (r2  -  G AZ/(4n)) 
l  It _ op _ 

Io<Arit  7  &  -  V2/(4"» 


(26) 


The  weights  are  not  updated  while  solving  equation  (26) 
numerically  for  the  signal  amplitude  A;  therefore,  equation  (26)  can  be 
solved  by  any  suitable  one  dimensional  iteration  method  such  as 
bisection,  Newton-Raphson,  etc.  Care  should  be  exercised  to  account 
for  the  possibility  of  multiple  solutions  of  (26).  After  A  is  found, 
the  noise  parameter  p  is  computed  from  equation  (25). 


10 


-238- 


TM.  No.  911189 


The  EM  method  requires  that  the  global  maximum  of  Q  be  found. 
This  aspect  of  the  derivation  is  necessary  if  the  EM  algorithm  is  to  be 
proved  mathematically  convergent.  The  proof  is  omitted. 

The  maximum  likelihood  estimates  are  computed  by  an  "inner-outer" 
iteration.  The  EM  algorithm  is  the  outer  iteration,  while  the  inner 
iteration  is,  say,  a  bisection  or  gradient  descent  method  for  solving 
the  necessary  conditions  (22)  and  (25).  The  EM  algorithm  is  explicitly 
stated  as  follows: 

EM  Algorithm  for  Maximum  Likelihood  Estimates  and  p^: 

1.  Initialize:  A(0)  >  0,  p(0)  >0,  k  =  0. 

2.  Set  k  =  k  +  1. 

3.  Let  A'=A(k-l )  and  p'=p(k-l).  Then: 

T 

a.  Using  equations  (2)-(4),  compute  the  vectors  (a^.,  jr^}  . 

b.  Solve  equations  (22)  and  (25)  for  A(k)  and  p(k). 

4.  Test  for  convergence:  Done? 

NO:  Loop  to  Step  2. 

YES:  Set  A^  =  A(k)  and  p^  =  p(k). 


11 


-239- 


TM.  No.  911189 


THE  SPECIAL  CASE  OF  ONE  DFT  CELL  IN  THE  GATE 


It  is  Interesting  to  examine  the  estimation  equations  in  the 

special  case  where  no  tracking  is  required.  If  the  Markov  chain  has 

one  state  corresponding  to  a  DFT  cell  and  no  zero  state,  the  signal 

must  always  be  present  in  the  only  state  of  the  Markov  chain.  Thus, 

7t(l)  «  1  for  all  time  t  and  for  all  choices  of  signal  amplitude  A  and 

noise  parameter  p.  Similarly,  GQp  *  1.  The  EM  algorithm  is  not 

iterative  because  there  is  no  "missing  data"  in  this  special  case. 

(N. B.  The  EM  algorithm  is  equivalent  to  maximum  likelihood  estimation 

when  there  is  no  "missing  data.")  Substituting  f.  *  G  *  1  and  <p'  ■ 

t  Op  lb 

1/T  into  the  estimation  equations  (22)  and  (25)  gives 


A 


IjlA^Np) 

IQ(A/itNp) 


(27) 


(T 

N 


1 

N  P 


a  - 


A_ 

4 


where 


z 

a  = 


T1  l 


V 


t=l 


(28) 


(29) 


and  where  =  r^  is  the  amplitude  of  the  only  DFT  cell  in  the  gate 
(see  equation  (1)).  Solving  for  the  noise  parameter  p  in  (28)  and 
substituting  into  (27)  gives  a  single  nonlinear  equation  to  be  solved 
numerically  for  the  signal  amplitude  A.  Explicitly,  this  equation  is 


A 


IjU/^/U2  -  A2/4)) 
IqU^/U7  -  A2/4)) 


(30) 


Equation  (30)  is  a  special  case  of  equation  (26),  and  the  remarks  made 


12 


-240- 


TM.  No.  911189 


after  (26)  concerning  numerical  solution  apply  to  (30)  also.  The  noise 
parameter  p  is  computed  from  (28)  after  A  has  been  computed  from  (30). 

The  estimation  equations  ( 27 )— ( 28 )  can  be  derived  directly  by 
maximum  likelihood  methods  without  using  the  HMM/AOO  tracking 
structure.  The  measurement  sequence  Z  =  (a^,  ...  ,  /tp)  is  a 

sequence  of  independent  realizations  of  a  random  variable  whose  PDF  is 
the  Rician  Pp(r|A,p)  given  by  (6).  Therefore,  the  posterior  likelihood 
function  of  Z  is 

T 

2(Z| A, p)  =  n  P.U.lA.p). 
t=l  1  1 

Omitting  scale  factors  independent  of  A  and  p,  the  posterior  likelihood 
function  of  Z  is  written  more  simply  as 

.  T 

£(Z| A, p)  =  pT  exp(-NpTA2/4)  exp^-NpT  a2  j  IT  IQ(ANp/it).  (31) 

Differentiating  I  with  respect  to  A  and  p,  setting  the  result  to  zero, 
and  simplifying  gives  the  necessary  conditions  ( 27 ) — ( 28 )  for  maximum 
likelihood  estimates  of  A  and  p. 

Equations  (27)  and  (28)  are  natural  to  the  frequency  line  SNR 
estimation  problem.  Related  results  for  optimal  detection  problems 
have  been  discussed  elsewhere.  Helstrom  [9,  Chapter  VII,  Section  1(b)] 
examines  the  optimum  likelihood-ratio  receiver  for  the  sequential 
frequency  line  detection  problem,  and  gives  several  references  to 
earlier  work. 


13 


-241- 


TM.  No.  911189 


CONCLUDING  REMARKS 


2 

The  estimation  equation  (25)  for  noise  power  in  a  bln,  <r  /N,  is 
interpreted  as  the  sample  mean  of  the  square  of  all  the  measured 
amplitudes  in  Z,  corrected  by  a  term  representing  the  signal  power 
smeared  over  the  entire  gate  and  over  the  full  time  history  T.  This 
interpretation  of  the  correction  term  is  reasonable  because  the 
division  by  n  smears  the  signal  over  the  DFT  cells  in  the  gate,  and 
because  the  multiplication  by  G  smears  the  signal  over  time.  Since 

the  G  estimates  the  fraction  of  the  time  interval  T  during  which  the 

op 

signal  occupies  the  gate,  the  estimation  equation  (22)  for  signal 
amplitude  A  is  interpreted  as  an  estimate  of  signal  amplitude  when 
signal  is  present  in  the  gate.  Without  the  Gop  factor,  the  estimate  of 
A  would  be  biased  low  by  signal  absence. 

The  estimation  equation  (22)  for  A  is  a  weighted  average  of  the 
measured  amplitudes  (r^)  over  the  full  measurement  sequence  Z.  The 
weight  applied  to  an  amplitude  r  is  interesting  because  it  is  a 
product  of  a  factor  representing  the  "global "  properties  of  the 
detector/tracker  HMM/AOO  and  a  factor  representing  the  "local" 
statistical  properties  of  the  measured  amplitude  in  a  DFT  cell.  The 
global  properties  are  those  associated  with  the  probability  field 
<rt ( i ) > ,  and  must  be  computed  by  the  recursions  (2)-(4).  The  local 
properties  are  those  associated  with  the  ratio  of  Bessel  functions 
(discussed  in  the  Appendix)  and  are  computed  easily  from  knowledge  of 
the  measured  amplitude  in  each  DFT  cell. 

An  alternative  to  solving  the  estimation  equations  (22)  and  (25) 

2 

is  to  use  them  simply  as  estimators  for  A  and  <r  /N.  If  successful, 
this  would  obviate  the  need  to  solve  the  equations  numerically. 
Another  alternative  is  to  estimate  the  noise  power  using  data  in  DFT 
cells  that  presumably  contain  no  signal.  If  this  is  done,  the 
estimation  equation  (25)  can  be  eliminated,  although  the  equation  (22) 
must  still  be  solved.  In  any  event,  further  study  of  these  equations 
is  merited. 


14 


-242- 


TM.  No.  911189 


APPENDIX 

The  function  I(x),  defined  by  the  Bessel  function  ratio 

I,(x) 

I(x)  «  — -  ,  (A.  1 ) 

Vx) 

is  of  independent  interest.  I(x)  is  an  odd  function  because  I^(x)  is 
odd  and  IQ(x)  is  even.  The  function  I(x)  is  plotted  in  Figure  2  for  x 
t  0.  Using  the  identities  Iq(x)  *  I ^ C x)  and  I'(x)  =  IQ(x)  -  I^xJ/x, 
the  derivative  of  I(x)  is  given  explicitly  by 

I'(x)  =  1-1  I(x)  -  I2(x).  (A.  2) 

I'(x)  is  plotted  in  Figure  3.  The  apparent  singularity  of  I'(x)  at  x  = 
0  is  removable  s»nce  I(x)  =  x/2  asymptotically  as  x  — >  0;  hence,  I'(0) 
=  1/2.  From  Figures  2  and  3,  it  is  clear  that,  for  x  fc  0,  I(x)  is  a 
cumulative  distribution  function  (CDF)  and  that  I'(x)  is  a  PDF.  A 
formal  proof  of  this  fact  (due  to  A.  H.  Nuttall)  is  given  in  this 
Appendix.  From  the  identity  [8,  equation  (9.6.19)] 

r * 

IQ(x)  =  i  exp(  x  cos  0  )  d0,  (A. 3) 

•'o 

it  follows  by  differentiation  that 

r* 

Iq(x)  *  I ^ (x)  =  i  cos  0  exp(  x  cos  0  )  d0  (A.  4) 

Iq' (x)  =  I'(x)  =  i  cos2  0  exp(  x  cos  0  )  d0.  (A. 5) 

Differentiating  I(x)  in  (A. 1)  gives  I'(x)  =  N(x)/I2(x),  where  the 
function  N(x)  =  IQ(x)  ~  Iq(x)  Ij(x).  Substituting  Iq(x)  = 

Ij(x),  and  the  identities  (A. 3)  -  (A.5)  into  the  expression  for  N(x) 
gives  the  result 

15 


-243- 


TM.  Mo.  911189 


T?  N(x)  = 


Fix  x  a  0. 


» 

n  i 

• 

n 

exp(  x  cos  0  )  d0 

2 

cos  x  exp(  x  cos  0  )  d0 

.  * 

0 

0 

f  r 

cos  0 


f 

exp(  x  cos  0  )  de 


t  J0  J  • 

Substituting  into  (A. 6)  the  functions 


(A.  6) 


a(0)  *  exp[ (x  cos  0)/2] 

b(0)  *  cos  0  exp[(x  cos  0)/2] 


and  applying  the  Cauchy-Schwartz  inequality  shows  that  N(x)  a  0. 

Equality  holds  in  the  Cauchy-Schwartz  Inequality  if  and  only  if,  for 

some  constant  c,  a(0)  s  c  b(0)  for  all  0  in  the  interval  [0,  ir] . 

Since  a(0)  *  c  b(0),  N(x)  cannot  equal  0.  It  follows  that  I'(x)  = 

2 

N(x)/Iq(x)  >  0.  Therefore,  I(x)  is  strictly  increasing  for  x  a  0.  From 
the  asymptotic  result,  valid  for  fixed  v,  [8,  equation  (9.7.1)] 

x 


yx)  = 


e 

^  l  1  " 

it  follows  that  I(x)  — »  1  as  x 


5#  * 


)• 


(A.  7) 


os.  Therefore,  I(x)  is  a  CDF. 


The  mean  value  of  I'(x)  does  not  exist  because,  like  the  Cauchy 
density,  I'(x)  has  such  a  heavy  tall  that  its  first  moment  is  infinite. 
The  heavy  tail  is  evident  from  Figure  2.  A  proof  of  the  unboundedness 
of  the  mean  value  follows  from  the  asymptotic  formula  (A. 7). 


16 


-244- 


0  1  2  3  4  5 


Figure  1.  Plots  of  P^CrlA.p)  for  Mjp=l  and  values  of  A  corresponding  to 
the  stated  values  of  signal-to-nolse  ratio*. 


Signal  power  - 


Signal  power  -  -10 

Signal  power  -  -5 

Signal  power  -  -4 

Signal  power  -  -3 

Signal  power  -  -2 

Signal  power  -  -1 

Signal  power  -  0 

Signal  power  -  1 

Signal  power  -  2 

Signal  power  -  3 

Signal  power  -  4 

Signal  power  -  5 

Signal  power  -  10 

Signal  power  -  15 


dB, 

A  ” 

0 

dB, 

A  «* 

.447214 

dB, 

A  = 

.795271 

dB, 

A  - 

.892308 

dB, 

A  - 

1.00119 

dB, 

A  - 

1.12335 

dB, 

A  - 

1.26042 

dB, 

A  - 

1.41421 

dB, 

A  - 

1.58677 

dB, 

A  - 

1.78039 

dB, 

A  - 

1.99763 

dB, 

A  - 

2.24138 

dB, 

A  - 

2.51487 

dB, 

A  - 

4.47214 

dB, 

A  - 

7.95271 

17 


-245- 


TM.  No.  911189 


REFERENCES 


1.  R.  F.  Barrett  and  R.  L.  Streit,  "Automatic  Detection  of  Frequency 
Modulated  Spectral  Lines,"  Proceedings  of  the  Australian  Symposium  on 
Signal  Processing  and  Its  Applications,  Adelaide,  Australia,  17-19 
April  1989. 

2.  A.  K.  Steele,  R.  L.  Streit,  and  R. F.  Barrett,  "Nonlinear  Frequency 
Line  Tracking  Algorithms,"  Proceedings  of  the  Australian  Symposium  on 
Signal  Processing  and  Its  Applications,  Adelaide,  Australia,  17-19 
April  1989. 

3.  R.  L.  Streit,  "Frequency  Line  Detector/Tracker  Using  Hidden  Markov 
Models  With  Amplitude  Information,"  NUSC  Technical  Memorandum  No. 
911143,  20  June  1991. 

4.  R.  L.  Streit  and  R.  F.  Barrett,  "Frequency  Line  Tracking  Using 
Hidden  Markov  Models,"  IEEE  Transactions  on  Acoustics,  Speech,  and 
Signal  Processing,  ASSP-38  (1990),  586-S98. 

5.  A.  P.  Dempster,  N.  M.  Laird,  and  D.  B.  Rubin,  "Maximum  Likelihood 
from  Incomplete  Data  via  the  EM  Algorithm,  “  Journal  of  the  Royal 
Statistical  Society,  Series  B,  39(1977),  1-38. 

6.  L.  E.  Baum,  "An  Inequality  and  Associated  Maximization  Technique  in 
Statistical  Estimation  for  Probabilistic  Functions  of  Markov  Chains," 
in  Inequalities  III.  0.  Shisha  (Editor),  Academic  Press,  New  York, 
1972,  1-8. 

7.  J.  L.  Mufioz  and  R.  L.  Streit,  "Connection  Machine  Implementation  of 
Hidden  Markov  Models  for  Frequency  Line  Tracking, "  in  Very  Large  Scale 
Computation  in  the  21st  Century  (J.  P.  Mesirov,  Editor),  Society  for 
Industrial  and  Applied  Mathematics,  Philadelphia,  1991,  204-217. 

8.  Handbook  of  Mathematical  Functions,  M- Abramowitz  and  I.  R.  Stegun 
(Editors),  AMS  55,  National  Bureau  of  Standards,  1964  (Tenth  printing 
1972). 

9.  C.  W.  Helstrom,  Statistical  Theory  of  Signal  Detection.  Second 
Edition,  New  York,  Pergamon  Press,  1968. 


2.0 


-248- 


Automatic  Detection  Of 
Frequency  Modulated 
Spectral  Lines 


R.  F.  Barrett  and  R.  L.  Streit 


-249- 


Automatic  Detection  of  Frequency  Modulated  Spectral  Lines 

R.F.  BARRETT 

Marttime  Systems  Div.,  Weapons  Systems  Research  Laboratory.  Salisbury,  SA. 

R.L.  STREIT 

Naval  Underwater  Systems  Center,  New  London,  Connecticut,  U.S.A 


SUMMARY  Three  methods  for  the  detection  of  narrowband  signals  of  unstable  frequency  embedded  in  a  white 
Gaussian  noise  background  are  Investigated.  The  first'  method,  known  as  the  Max lmum-Power-Track 
detector,  divides  the  frequency  domsin  into  gates  containing  a  fixed  nusfeer  of  ITT  cells.  The  xmxlaui 
spectral  power  within  the  gate  is  Integrated  over  time  and  used  for  detection  purposes.  The  MPT  detector 
is  fast  and  easy  to  implement.  The  other  detectors  involve  an  extension  of  the  Hidden  Markov  Model  (WM) 
frequency  trackui  developed  earlier  by  Streit  and  Barrett  (19M)  to  allow  for  detection.  The  WM  trackers  have 
been  extended  to  allow  for  the  Inclusion  of  amplitude  Information  in  the  Input  measurement  sequence.  It  Is 
found  that  the  MPT  detector  is  a  simple  and  effective  detector,  and  that  the  extended  HIM  detectors 
represent  a  significant  improvement,  albeit  at  the  cost  of  Increased  computational  complexity. 


l .  introduction 

The  detection  of  stable  frequency  lines  embedded 
in  white  Gaussian  noise  is  a  well-studied  problem  in 
detection  theory.  In  this  case,  the  optimal 
detector  is  the  conventional  integrated  power 
detector.  in  the  usual  implementation,  the  sampled 
time  series  is  divided  into  non-overlapped  data 
blocks,  and  Fast  Fourier  Transforms  (FFTs)  of  each 
block  are  computed.  The  site  of  the  blocks  is 
chosen  to  give  the  desired  frequency  resolution  for 
the  problem  under  study.  It  is  assumed  that  the 
signal  frequency  is  stable  enough  so  that  the  signal 
remains  entirely  within  the  same  FFT  frequency  cell 
from  data  block  to  data  block.  The  spectral  power 
within  each  FFT  cell  is  then  sunned  over  all  the 
data  blocks,  and  a  detection  is  registered  whenever 
the  integrated  power  in  one  cell  exceeds  a 
predetermined  detection  threshold.  This  threshold 
is  set  by  considering  the  case  when  noise  only  is 
present.  The  Probability  Distribution  Function 
(PDF)  of  the  noise  power  is  assumed  to  be  Gaussian, 
when  the  noise  is  not  white  Gaussian,  it  is 
necessary  to  first  "pre-whiten"  by  filtering  the 
noise  background  so  that  the  filtered  noise  has  the 
desired  statistical  characteristics  (e.g.,  unit 
variance,  white) . 

In  the  case  when  the  frequency  of  the  signal  to  be 
detected  is  unstable  (i.e.,  the  signal  frequency 
wanders  over  several  FFT  frequency  cells),  the 
detection  performance  of  the  conventional  integrated 
power  detector  becomes  degraded.  Instead  of  the 
spectral  power  being  concentrated  in  one  frequency 
cell  from  data  block  to  data  block,  it  becomes 
smeared  over  several  cells  in  the  frequency  "gate". 
The  integrated  signal  power  in  individual  cells 
within  the  gate  may  then  not  exceed  the  detection 
threshold,  and  a  missed  detection  will  ensue. 

In  this  paper,  we  compare  three  methods  for  the 
detection  of  unstable  frequencies  embedded  in  white 
Gaussian  noise.  The  first  method,  discussed  in 
Section  2,  is  known  as  the  Maximum-Power-Track  (MPT) 
detector,  and  represents  a  simple  but  effective 
extension  of  the  conventional  detector.  The  other 
two  methods  are  based  on  the  Hidden  Markov  Model 
(HMM)  .  They  are  extensions  of  an  earlier  work  on 
the  use  of  the  HMM  for  frequency  line  tracking 
presented  by  Streit  and  Barrett  (1999),  and  are 


discussed  in  Section  3.  A  comparison  of  the  results 
from  the  three  approaches  is  presented  in  Section  4. 


2.  TBS  MAYTMOM-Pownt-TRACK  DBTSCTOR 

The  MPT  detector  represents  a  straightforward 
attempt  to  enhance  the  poor  performance  of  the 
conventional  detector  for  fluctuating  signal 
frequencies.  A  gate  of  frequency  cells  is  defined, 
with  the  number  M  of  cells  in  the  gate  chosen  large 
enough  to  encompass  the  extremes  of  the  frequency 
meanders.  For  each  of  the  T  blocks  of  data,  the 
cell  containing  the  maximum  spectral  power  is 
selected  from  all  cells  within  the  gate.  A 
measurement  threshold  is  set  for  the  spectral  power. 
If  the  power  in  the  selected  cell  exceeds  this 
measurement  threshold,  the  cell's  power  is  sdded  to 
s  sum  that  accumulates  the  total  power  in  the 
selected  cells  over  the  window  of  T  data  blocks. 
The  accumulated  total  power  is  designated  the  "gate 
povsr".  A  detection  is  registered  whenever  the  gate 
power  exceeds  a  detection  threshold.  This  threshold 
is  set  by  estimating  the  gate  power  in  gates 
containing  only  noise,  in  the  same  manner  as  for  a 
conventional  detector.  The  conventional  detector 
represents  a  special  case  of  the  MPT  detector  with 
M  -  1. 

The  simplicity  of  the  MPT  detector  enables  a 
statistical  analysis  to  be  carried  out  of  its 
performance.  In  the  analysis,  we  assume  that  at  the 
time  the  frequency  is  computed  for  each  data  block 
L,  a  signal  is  present  with  amplitude  A  in  the 
centre  of  the  m-th  FFT  frequency  cell.  The 
background  noise  is  assumed  to  be  white,  zero-mean 
Gaussian,  with  a  broadband  noise  power  of  <r  .  The 
signal  frequency  lies  in  different  FFT  cells  st 
different  times  (i.e.,  ra  depends  on  L)  .  The  FFT 
data  blocks  are  of  size  N. 

Under  these  assumptions,  the  PDF  F(p)  of  the 
spectral  power  contribution  p  of  each  FFT  data  block 
to  the  total  gate  power  is  given  by: 


-251- 


F(p>  -  5(p> 


D 

n  -  1 

<e)de 

0 

■ 

P,  (e) de  (la) 


for  p  a  D  , 


r  fP  r 

“  P1  <p)  I  P2  (®>de 


P1  (e)de 

P  (•)<!• 

■  ^0 

.  Jq 

(lb) 


for  p  >  D 


where  D2  Is  Che  measurement  power  threshold,  and 
8<x)  Is  Dirac's  delta  function. 


In  Equation  1,  p  (e)  Is  the  PDr  for  the  spectral 
power  In  the  cell  containing  the  signal,  and  (e) 

Is  the  corresponding  PDF  In  cells  containing  no 
signal.  The  PDFs,  p^ (e)  and  P^e),  are  given  by: 


Pt(e) 


N  1  fANi/e) 


■N(4e  +  A*) 

4<r2 


(2a) 


and 


'  N  ' 

f-Ne 

- 

2 

«*p  - T 

O’ 

& 

(2b) 


where  I0 (x)  it  .he  zeroth  order  Bessel  function. 

The  gate  power  is  obtained  by  the  summation  of  T 
independent  realisations  of  the  stochastic 
variable  p.  Under  the  assumption  that  T  is  large, 
the  mean  and  variance  of  the  gate  power  are  readily 
calculated  from  Equations  1  and  2  by  making  use  of 
the  Central  Limit  Theorem.  The  variance  of  the 
HPT  detector  can  then  be  studied  as  a  function  of 
the  measurement  power  threshold  D  by  the  numerical 
evaluation  of  Equations  1  and  2.  From  such  a  study, 
it  was  round  that  the  optimal  value  of  D  for  this 
detector  is  zero. 

In  Section  4,  'he  Peceiver  Operating  Characteristics 
(ROC)  and  minimum  total  error  versus  SNR  curves  are 
presented  for  the  MPT  detector.  Section  4  also  gives 
detection  performance  comparisons  with  the  HMM 
detectors  described  next  in  Section  3. 

3.  DETECTION  BASED  ON  HMMs 

The  HMM  trackers  presented  by  Streit  and  Barrett 
(1988)  and  Barrett,  et.al.,  (1988)  for  tracking 
time-varying  frequency  lines  are  post -measurement 
device  trackers.  The  FFTs  of  a  sequence  of  blocked 
non-over lapped  sampled  time  series  data  are  passed 
through  a  similar  gated  threshold  detector  to  that 
of  the  MPT  detector.  The  detector  output  is  the 
measured  frequency  of  the  signal,  i.e.,  the  midpoint 
of  the  FFT  cell  within  the  gate  having  the  largest 
amplitude  is  the  measurement,  provided  this 
amplitude  exceeds  the  measurement  threshold.  If  the 
threshold  is  not  exceeded,  the  measurement  ia 
defined  to  be  the  "zero  state".  Thus,  at  every  time 
step  a  measurement  is  mad*.  Because  of  the  zero 
state,  the  HIM  trackers  have  the  intrinsic 
capability  of  automatic  track  initiation  and 
tensination. 


slats  iie-art  pres«nteu  oy  duttu  anu  o«ci«u 
(1988)  and  Barrett,  et.al,  (1988)  do  not  use  phase 
or  amplitude  information,  yet  they  are  effective 
trackers.  He  have  included  amplitude  information  in 
the  HMM  tracker  used  in  this  paper.  This  extension 
is  straightforward  and  does  not  materially  alter  the 
probabilistic  methods  used  in  the  HW4  tracker. 
However,  the  addition  of  amplitude  information 
results  in  better  detection  performance  at  lower 
SNRs  than  the  original  KIM  tracker  described  by 
Streit  and  Barrett  (1988)  and  Barrett,  et.al. 
(1988) .  The  HMM  tracker  with  amplitude  information 
included  is  denoted  by  the  acronym  IOM/A,  and  its 
tracking  performance  is  discussed  by  Steele,  et.al., 
(1989)  . 

The  HMM  trackers  (either  with  or  without  amplitude 
information)  can  reconstruct  the  signal  frequency 
track  in  several  different  ways  by  utilizing  the 
different  features  of  a  general  probabilistic 
structure  that  la  defined  from  the  mathematical 
structure  of  HMU.  The  two  features  of  Interest  in 
this  piper  are  the  posterior  likelihood  function 
that  gives  the  likelihood  of  the  entire  measurement 
sequence,  and  the  gate  occupancy  probability  (OOP) 
function  that  gives  the  likelihood  of  the  signal  not 
occupying  the  zero  state  (i.e.,  signal  presence  at 
each  tlma  step) .  complete  details  are  given  by 
Streit  and  Barrett  (1989) . 

The  first  MM/A  detector  we  discuss  is  the 
log-likelihood  ratio  (LLR/A)  detector.  The 
likelihood  functions  used  are  the  ones  that  the 
HMM/A  tracker  defines  under  the  "signal  present"  and 
"signal  absent*  hypotheses.  The  "signal  present" 
hypothesis  is  true  if  the  measurement  sequence  is  a 
realisation  of  the  synthetic  signal  source 
characterised  by  the  detector's  MM,  while  the 
"signal  absent"  hypothesis  is  true  if  the  sequence 
is  a  realiaation  of  a  synthetic  noise  source 
characterised  by  the  zero  state  of  the  detector's 
KMM.  The  parameters  defining  the  detector's  MM  are 
the  assumed  SNR,  the  process  noise  standard 
deviation  d  ,  the  track  initiation  probability  u  and 
the  track  termination  probability  v.  The  LLR/A 
detector  is  optimal  in  the  Neyman-Pearaon  sens*; 
i.e.,  for  a  given  probability  of  false  alarm,  the 
probability  of  signal  detection  is  a  maximum.  Ns 
stress  that  the  LLR/A  detector  is  optimal  if  and 
only  if  the  posterior  likelihood  function  defined  by 
the  HMM/A  tracker  is  exactly  matched  to  the  signal. 
Mismatch  necessarily  causes  the  LLR/A  detector  to  be 
s  sub-optimal  detector. 

The  second  detector  is  the  GOP/A  detector,  so  named 
because  it  integrates  the  GOP/A  function  that  the 
HMM/A  tracker  uses  to  decide  on  track  initiation  and 
termination  at  each  time  step.  When  the  HMM/A 
tracker  initlatea  a  track  autoemtically,  it  is 
affectively  registering  a  detection.  The  GOP/A 
detector  is  therefore  employing  the  intrinsic  signal 
dstector  of  the  HMM/A  tracker.  Because  this  detector 
is  an  integrator,  it  can  be  interpreted  aa  an 
estimate  of  the  fraction  of  the  total  masureaasnt 
time  interval  for  which  a  signal  with  certain 
specified  values  of  the  MM  parameters  (SNR,  d,  u 
and  v>  is  present.  Analysis  of  the  detection 
performance  is  made  difficult  by  the  fact  that  the 
GOP/A  function  values  are  highly  correlated  from 
time  sample  to  time  sample  because  the  MM/A  tracker 
is  a  fixed  interval  tracker.  The  OOP/A  detector  is 
not  equivalent  to  the  LLR/A  detector,  and  no 
optimality  criterion  is  known  for  it. 

Analytic  expressions  for  the  ROC  curves  of  the  LLR/A 
and  GOP/A  detectors  are  unknown.  Preliminary 
theoretical  results  suggest,  however,  that  the 
underlying  conditional  PDFs  for  the  relevant 


-252- 


statistics,  unaec  tne  nypotneses  o:  "signa*  present" 
and  "sional  absent",  are  asymptotically  log-normal 
for  both  detectors.  Further  support  is  given  by  the 
observation  that  the  ROC  curves  obtained  below  by 
simulation  are  straight  lines  on  normal  probability 
paper . 

It  will  be  seen  in  Section  4  that  the  GOP/A  detector 
is  superior  to  the  LLR/A  detector.  Since  the  LLR/A 
uses  an  optimal  detection  statistic,  the  question 
naturally  arises  as  to  how  the  LLR/A  detector  can  be 
inferior  to  the  GOP/A  detector.  The  answer  is  that 
the  detector  HMM,  used  in  the  LLR/A  to  characterise 
the  "signal  present"  hypothesis,  is  not  matched  to 
the  HMM  used  to  simulate  the  input  measurement 
sequence.  The  LLR/A  detector  is  therefore 

sub-opti""il  for  the  simulated  data. 

4 .  RESULTS 

Of  the  different  detectors  discussed  above,  only  the 
performance  of  the  MPT  detector  lends  itself  to 
theoretical  analysis.  Consequently,  the  detectors 
are  compared  in  this  paper  by  using  simulation.  The 
comparisons  are  made  on  the  basis  of  ROC  curves  at 
different  CNRs  (Fig.  1)  and  plots  of  the  minimum 
total  error  rate  as  a  function  of  SNR  (Fig.  2) .  For 
any  SNR,  as  the  signal  detection  threshold  is  raised 
or  lowered,  the  relative  frequency  of  occurrence  of 
false  alarms  and  false  dismissals  is  varied.  There 
exists  an  optimal  detection  threshold  for  which  the 
combined  sum  of  these  two  sources  of  error  is 
minimised.  Fig.  2  displays  this  minimum  error  rate 
as  a  function  of  SNR. 

Simulated  measurement  sequences  were  generated  by 
using  the  HMM/A  tracker  as  a  synthetic  source.  A 
simulated  measurement  sequence  is  a  sequence  of 
measurements  at  the  output  of  the  threshold 
measurement  device  when  a  signal  of  specified 
characteristics  is  input  to  it.  The  device 
measurement  threshold  is  preselected  and  fixed. 
Thus,  to  simulate  the  "signal  present"  case,  it  is 
necessary  to  specify  only  the  track  SNR  and  the 
track  process  noise.  The  track  is  started  in  the 
middle  of  the  gate  and  is  prevented  from  terminating 
by  setting  the  track  termination  probability,  v,  to 
zero.  (Note  that  this  model  of  "signal  present" 
differs  from  that  assumed  by  the  HMM  detector  only 
in  the  underlying  track  initiation  and  termination 
probabilities.)  The  "signal  absent"  hypothesis  is 
simulated  by  starting  the  track  in  the  zero  state, 
and  setting  SNR  •  0,  u  “  0  and  v  •  1.  Thus,  the 
simulated  "signal  absent"  measurement  sequence 
consists  of  a  sequence  of  independent  measurements 
emitted  from  the  zero  state  of  the  HMM. 

Input  data  to  the  each  of  the  detectors  comprised  a 
set  of  10000  simulated  measurement  sequences,  each 
of  length  100  time  steps,  for  both  the  "signal 
present"  and  "signal  absent"  cases.  The  input  data 
to  all  detectors  were  identical.  The  HMM/A 
parameters  u,  v  and  d  are  unchanged  by  the  addition 
of  amplitude  information  to  the  original  HMM 
tracker,  ai.a  have  been  discussed  in  earlier 
publications  by  Streit  and  Barrett,  (1988)  and 
Barrett,  et.al.,  (1988).  The  process  noise  standard 
deviation  d  for  the  simulated  "signal  present" 
measurement  sequence  was  set  to  0.333  (in  FFT 
resolution  cell  widths) .  The  parameters  for  the  GOP 
detectors  were:  track  initiation  probability  u  *» 

0.00029,  track  termination  probability  v  -  0.000035; 
process  noise  standard  deviation  d  ■  0.333; 

measurement  threshold  D  -  0 . 

Fig.  1  shoe's  the  ROC  curves  for  the  MPT  detector 
(solid  lines)  and  the  best  of  the  HMM  detectors,  the 


uOP/A  aetector  (dasned  line).  Fig. 2  snows  the 
minimum  error  as  a  function  of  SNR  for  the  MPT 
detector  and  all  of  the  HMM  detectors  investigated 
here  (i.e.,  the  GOP  and  LLR  detectors,  both  with  and 
without  amplitude  information) . 

The  first  conclusion  to  be  drawn  from  Fig.  2  is  that 
the  MPT  detector  is  a  very  good  detector,  and  it  is 
not  until  amplitude  information  is  included  that  the 
two  HMM  detectors  surpass  the  performance  of  the  MPT 
detector.  The  MPT  detector  represents  a  substantial 
improvement  over  the  conventional  detector,  and 
regularly  produced  correct  detections  with  simulated 
data  when  no  peaks  were  observable  in  the 
conventional  power  spectrum. 

The  optimal  measurement  threshold  for  the  MPT 
detector  is  zero.  For  the  HMM/A  detectors,  as  noted 
in  Section  2,  the  optimal  measurement  threshold  (as 
determined  from  simulation  studies)  is  sufficiently 
small  that  missed  measurements  occur  only 
infrequently.  However,  for  HMM  detectors  with  no 
amplitude  information,  the  optimal  meaaurement 
threshold  is  significantly  different  from  zero.  The 
comparisons  in  Fig.  2,  where  all  detectors  have  a 
zero  meaaurement  threshold,  is  therefore  somewhat 
unfair  towards  the  last  two  detectors.  However, 
optimisation  of  the  measurement  threshold  does  not 
improve  the  performance  of  these  two  detectors  to 
the  point  where  they  equalled  the  MPT  detector  in 
performance.  This  is  remarkable  considering  the 
relative  complexities  of  the  three  detectors,  and 
indicates  the  importance  of  amplitude  information  to 
the  detection  process. 

The  inclusion  of  amplitude  information  in  the  HMM 
detectors  results  in  approximately  a  1.6  dB 
improvement  in  performance  at  an  SNR  of  0.002 
(i.e., -27  dB)  .  In  other  words,  the  SNR  would  have 
to  be  increased  by  1.6  dB  before  the  performance  of 
the  HMM  detectors  without  amplitude  information 
would  equal  that  of  the  hmm/a  detectors.  Here,  we 
are  defining  SNR  to  be  the  ratio  of  signal  power  to 
broadband  noise  power.  The  best  of  the  detectors 
was  the  GOP/A  detector,  which  was  marginally 
superior  to  the  LLR/A  detector,  and  0.8  dB  superior 
to  the  MPT  detector.  From  the  ROC  curves  in  Fig.  1, 
it  can  be  seen  that  for  an  SNR  of  0.002  and  a  false 
alarm  probability  of  0.2t,  the  probability  of 
detection  for  the  MPT  detector  is  70%  but  is  almost 
90%  for  the  GOP/A  detector.  This  improved 
performance  is  achieved  at  the  expense  of  a 
considerably  increased  numerical  complexity. 

5.  CONCLUSIONS  AND  CONCLUDING  RXMARXS 

The  MPT  detector  is  recommended  in  applications 
requiring  simplicity,  ease  of  implementation,  and 
robust  detection  performance  on  unstable,  or  time- 
varying,  frequency  lines.  The  MPT  detector  is  very 
simple  to  implement,  yet  it  gives  better  detection 
performance  than  the  conventional  detector  for 
unstable  lines.  Measurement  outliers  do  not 
significantly  affect  the  MPT  detector  because  the 
track  estimate  implicit  in  the  MPT  detector  is  the 
detection  sequence  itself.  Missed  detections  do  not 
occur  because  numerical  evaluation  of  Equations  1 
and  2  shows  that  the  optimal  measurement  threshold 
is  zero. 

To  achieve  better  detection  performance  than  that 
given  by  the  MPT  detector,  it  seems  necessary  to 
track  the  frequency  line  accurately  and  to  include 
amplitude  information  in  the  tracker.  The  best  HMM 
detectors  studied  in  this  paper  utilize  amplitude 
information  and  significantly  out -perform  the  MPT 
detector.  However,  the  HMM/A  detectors  require 


-253- 


± 


TIG.  1  Receiver  Operating  Characteristics  m 
(solid  lines)  and  gop/a  (dashed  lines)  detectors. 


(lux.)  curves 


FIG.  2  The  minimum  error  as  a  function  of  SNR  for 
the  detectors:  a)  GOP/A  ;  b)  LLR/A  :  c)  MPT  • 
d)  GOP  e)  LLR 


-25- 


P 


5  igr.i  f  icart  iy  more  computational  effort  and  are  not 
as  easy  t  c  r.f  lenient  as  the  MPT  detector.  The  HKM/A 
detectors  are  therefore  recommended  in  applications 
requiring  the  best  possible  detection  performance 
Tw;  :ra;>.  cr  '.nates  are  implicit  in  the  HMM / A 
tracker,  u'-Thcucr.  neither  track  estimate  is  utilized 
explicitly  by  the  detection  algorithm.  Measurement 
outliers  etc  not  seriously  degrade  the  performance  of 
HMM/ A  detectors  because  HMM/ A  trackers  discriminate 
well  againrt  outliers.  Missed  measurements  are  not  a 
problem  for  KMM/A  detectors  because  simulation  shows 
that  the  best  measurement  threshold  to  use  is  so 
small  that  missed  measurements  occur  infrequently. 

The  computational  burden  required  by  HMM/ A  detectors 
adversel.  affects  total  system  performance.  It  is 
possible,  however,  to  greatly  reduce  these  effects 
in  some  applications,  for  example,  examination  of 
the  mathematical  structure  of  HMM/ A  detection 
algorithms  re  uls  that  they  can  be  formulated  for 
several,  Suj  Y,  gates  simult aneously,  and  this 
formulation  is  equivalent  to  a  sequence  of  K-by-M 
matrix  multiplications,  if  each  gate  has  wicth  M 
This  formulation  car.  be  exploited  in  either  software 
or  hardware,  or  both  In  particular,  if  K  ■  M. 
systc.ir  ar.ays  desicnec  especially  for  K-by-h 
matrix  multiplication  car.  efficiently  implement  K 


HMM  detectors  m  parallel.  Inexpensive,  commercially 
available.  FC-level  systolic  arrays  will  probably 
become  available  ir.  the  net  too  distant  future  (see 
Marwocd  and  Clarke  fl9PR' )  . 

RCTXRENCES 

Barrett,  R.F.,  Steele,  A.K.,  and  Streit,  R.L.  (1988) 
Frequency  line  tracking  algorithms.  Proceedings  of 
the  NATO  Advanced  Study  Institute  on  Underwater 
Acoustic  Data  Processing,  Kingston,  Ontario, 
Canada,  18-29  July  1988. 

Marwood,  w.  and  Clarke,  A.P.  (1988)  A  coprocessor 
with  supercomputer  capabilities  for  personal 
computers.  Proceedings  ci  the  XCCD-IEEE 
International  Conference  or.  Computer  Design:  VLSI 
Computers  and  Processors.  IEEE  Computer  Society 
Press,  New  York,  pp.  468-471. 

Steele,  A.K.,  Streit,  R.L.,  and  Barrett,  R.F. 
(1989)  Nonlinear  frequency  line  tracking  algorithms, 
this  conference. 

Streit,  R.i.  ar.d  Barrett,  R.F.  (198?'  Frequency  lire 
tracking  using  hinder.  Markov  models  IEEE  Trans  cr. 
Accust.,  Sp.,  and  Sig.  Proc.,  submitted. 


-255- 


The  Moments  Of  Matched  And  Mismatched 
Hidden  Maikov  Models 

R.  L.  Streit 


-257- 


610 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  38.  NO.  4.  APRIL  1990 


The  Moments  of  Matched  and  Mismatched  Hidden 

Markov  Models 

ROY  L.  STREIT,  senior  member,  ieee 


Abstract — An  algorithm  for  computing  the  moments  of  matched  and 
mismatched  hidden  Markov  models  from  their  defining  parameters  is 
presented.  The  algorithm  is  of  general  interest  because  it  is  an  exten¬ 
sion  of  the  usual  forward-backward  linear  recursion.  The  algorithm 
computes  the  joint  moments  of  the  posterior  likelihood  (Unctions  (i.e., 
the  scores)  by  a  multilinear  recursion  involving  the  joint  moments  of 
the  random  variables  associated  with  the  hidden  states  of  the  Markov 
chain.  Examples  comparing  the  first  two  theoretical  moments  to  sim¬ 
ulation  results  are  presented.  They  are  of  independent  interest  because 
they  indicate  that  the  distribution  of  the  posterior  likelihood  function 
scores  for  matched  and  mismatched  models  are  asymptotically  log-nor¬ 
mal  in  important  special  cases  and,  therefore,  are  characterized 
asymptotically  by  the  first  two  moments  alone.  One  example  discusses 
the  eifect  of  a  noisy  discrete  communication  channel  on  a  suboptimai 
classification  method  based  on  the  distributions  of  scores  rather  than 
on  maximum  likelihood  classification. 


I.  Introduction 

IDDEN  Markov  models  (HMM’s)  are  statistical 
models  that  are  developed  in  diverse  applications  to 
characterize  different  classes  of  nonstationary  time  series 
or  signals.  Subsequently,  Ftt are  utilized  for  the  au¬ 
tomatic  classification  of  an  unknown  signal  into  one  of 
these  signal  classes.  In  spe  ch  applications,  they  are  used 
to  characterize  the  time  variation  of  the  short-term  spectra 
of  spoken  words.  An  example  is  the  speaker-independent 
isolated  word  recognition  (SIIWR)  problem  where 
HMM’s  characterize  the  words  (or  parts  of  words)  in  a 
finite  size  vocabulary.  Different  words  are  characterized 
by  different  HMM's  [1].  In  target  tracking  applications, 
HMM’s  are  used  to  characterize  the  time  variation  of  a 
target  track  measurement  sequence.  A  specific  example  is 
the  narrow-band  frequency  line  tracking  problem  where 
HMM’s  characterize  possible  target  frequency  shifts  as 
well  as  noise  in  the  measurement  sequence  for  finite  sig- 
nal-to-noise  ratio  (SNR).  Different  HMM’s  characterize 
different  target  track  dynamics  and  different  SNR’s  [8].  A 
brief  description  of  the  mathematical  structure  of  HMM’s 
is  given  at  the  beginning  of  Section  II. 

An  application-specific  preprocessor  is  critical  to  the 
successful  use  of  HMM's  in  the  application.  This  prepro¬ 
cessor  maps  (or  transforms)  an  arbitrary  input  signal  s  ( t ) , 
t  2:  0  into  a  discrete  observation  sequence  { 0(t ) ,  t  =  1 , 
2,  •  •  • } .  Reference  [1,  pp.  1077-1078]  gives  a  descrip- 

Mxnuscript  received  July  II.  1987;  revised  January  3.  1989. 

The  author  is  with  the  Naval  Underwater  Systems  Center.  New  London , 
CT  06320. 

IEEE  Log  Number  8934008. 


tion  of  one  such  preprocessor  for  the  SIIWR  problem,  and 
(8]  describes  one  suitable  for  frequency  line  tracking. 
Throughout  this  paper,  it  is  assumed  that  a  satisfactory 
preprocessor  is  available,  but  no  assumptions  are  made 
about  its  specific  nature.  The  output  of  the  preprocessor 
constitutes  the  observation  sequence.  In  practice,  this  se¬ 
quence  is  truncated  to  have  finite  length  T  where  T  is  se¬ 
lected  according  to  the  application  needs.  The  truncated 
sequence  is  denoted  by  0T  =  { 0(f) ,  r  =  1,  2,  •  •  ■  , 
T}. 

The  act  of  computing  specific  numerical  values  for  the 
various  parameters  of  an  HMM  is  called  “training.” 
Training  takes  place  on  the  outputs  of  the  preprocessor 
when  it  is  given  multiple  realizations  of  a  specific  signal 
class.  If  the  Baum-Welch  reestimation  algorithm  is  used 
for  training,  then  training  is  equivalent  to  solving  a  math¬ 
ematical  optimization  problem  to  determine  maximum 
likelihood  estimates  of  the  HMM  parameters  [2].  In  this 
paper,  it  is  assumed  that  the  training  phase  is  completed 
and  that  the  HMM’s  developed  are  adequate  models  for 
each  of  the  signal  classes  of  interest  (e.g.,  the  vocabulary 
words  in  the  SIIWR  problem  or  the  target/SNR  charac¬ 
teristics  in  the  tracking  application).  We  denote  by 
HMM(i )  the  HMM  parameter  set  defining  the  ith  signal 
class.  An  important  consequence  of  these  training  as¬ 
sumptions  is  that  HMM  ( i )  can  be  used  as  a  synthetic 
signal  source,  that  is,  HMM(i )  can  be  used  to  simulate 
the  output  of  the  preprocessor  when  the  ith  signal  is  input 
to  it.  We  use  the  notation  0T  e  HMM  ( i )  to  mean  that  the 
observation  sequence  0T  is  a  realization  of  a  random  vec¬ 
tor  whose  statistical  distribution  is  defined  implicitly  by 
HMM(i). 

HMM’s  are  used  for  classification  of  an  unknown  ob¬ 
servation  sequence  Or  by  exploiting  a  probability  measure 
or  posterior  likelihood  function,  as  depicted  in  Fig.  1 .  The 
posterior  likelihood  function  is  defined  on  the  set  of  all 
truncated  sequences  {  Ot  }  by  utilizing  the  mathematical 
structure  of  HMM’s.  Thus,  the  likelihood  of  a  given  0T 
depends  critically  on  the  numerical  values  of  the  param¬ 
eters  defining  the  underlying  HMM.  The  ith  HMM  recog¬ 
nizer  computes  the  posterior  likelihood  function  /  (  Ot  ) . 
If  HMM(i )  is  a  finite  symbol  HMM  (see  Section  II  be¬ 
low),  then  f(0T)  is  equivalent  to  a  probability,  that  is, 

f(0T)  =  Pr[07.|HMM(r)],  i  =  l,  ■  ■  ■  ,  p.  (1) 

The  maximum  of  the  computed  likelihoods  identifies  or 
classifies  the  original  signal  s(t)  that  was  input  to  the  pre- 


-259- 


STREIT:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM’S 


611 


Fig-  1.  Classification  of  unknown  signal  s(0  as  one  of  p  signals  for  which 
trained  HMM's  are  available. 


processor.  It  is  well  known  that  this  classifier  is  optimum 
in  the  Neyman-Pearson  sense;  that  is,  for  a  specified 
probability  of  incorrect  classification,  the  probability  of 
correct  classification  is  a  maximum  [3]. 

The  fundamental  problem  studied  in  this  paper  is  the 
determination  of  the  probability  density  function  (pdf)  of 
the  test  statistic  f,(0T)  when  Or  e  HMM  (j ) .  In  other 
words,  if  0T  is  a  random  vector  generated  by  HMM  (j ) , 
what  is  the  pdf  of  the  numerical  values  of  the  ith  posterior 
likelihood  function  /(Or)?  Note  that  the  HMM’s  are 
matched  if  i  =  j  and  mismatched  if  i  *=  j.  This  paper 
presents  an  algorithm  for  computing  explicitly  the  mo¬ 
ments  of  the  desired  pdf  up  to  any  required  order  directly 
from  the  underlying  parameters  of  the  HMM’s  involved, 
and  presents  examples  that  compare  the  first  two  theoret¬ 
ical  moments  to  simulation  results.  The  algorithm  is  of 
general  interest  because  it  is  an  extension  of  the  usual  for¬ 
ward-backward  linear  recursion  [2]  for  HMM’s.  It  com¬ 
putes  the  joint  moments  of  the  likelihood  functions  f(Or ) 
by  a  multilinear  recursion  involving  the  joint  moments  of 
the  random  (observation)  variables  uniquely  associated 
with  the  hidden  states  of  the  HMM’s.  The  examples  are 
of  independent  interest  as  well.  First,  they  indicate  that 
the  desired  pdf  is  asymptotically  log-normal  in  important 
special  cases  and,  therefore,  is  completely  characterized 
(asymptotically)  by  the  first  two  moments  alone.  It  is  not 
obvious  how  the  central  limit  theorem  can  be  used  to  ac¬ 
count  for  this  result.  Second,  the  examples  show  that  a 
suboptimal  classification  method  using  preset  detection 
thresholds  for  the  likelihood  functions  f(0T)  may  be 
useful  in  certain  instances.  This  point  is  discussed  at  the 
end  of  this  section. 

The  distribution  we  seek  is  defined  via  its  cumulative 
distribution  function  (cdf),  denoted  by  Fy  (jc)  .  It  is  intu¬ 
itively  appealing  to  attempt  to  define  F y  (*)  by  setting 

F0(x)  =  PT[f(OT)  <x  and  0Te  HMM  (/')] 

where  x  is  any  real  number;  however,  such  a  definition  is 
ambiguous  because  the  meaning  of  the  probability  mea¬ 
sure  Pr  [  •  ]  is  unclear.  Instead,  for  finite  symbol  HMM’s, 
we  define 

F,j(x)  =  Ztf(*  -f(0T))fi(0T)  (2) 

Or 

where  the  function  H(  ■ )  is  defined  by 
H(x)  =  1  if*  a  0 

H(x)  =  0  if*  <  0. 


From  (2),  it  is  clear  that  Fy  (*)  is  a  cdf  because  it  is  a 
nonnegative  increasing  right-continuous  function,  and  the 
limit  of  Fjj  (*)  is  0  as*  goes  to  O'  and  1  as*  goes  to  +oo. 
For  continuous  symbol  HMM’s,  the  summation  over  0T 
in  (2)  must  be  replaced  by  integration  over  0T.  Algo¬ 
rithms  that  calculate  Fy  directly  from  the  HMM  parame¬ 
ters  are  not  known.  For  later  reference,  note  that,  in  gen¬ 
eral,  Fy(*)  *  Fji  (*) . 

The  moments  of  dFy  (*)  are  defined  by  the  Riemann- 
Stieltjes  integral 

My(*.  T)  =  j_^**dFy(*),  k  =  0,  1.2,  ••• 

(3) 

If  Fy  (*)  is  differentiable  with  derivative  Fy  (*) ,  then  the 
moments  can  be  written  equivalently  as  the  Riemann  in¬ 
tegral 

Mij(k,T)=\  xkF[j{x)dx. 

J  -00 

The  moments  depend  on  the  length  T  of  the  observation 
sequence  because  Fy  (*)  depends  on  T,  as  seen  from  (2). 
They  uniquely  determine  dF,j  (*)  when  they  are  all  finite 
and  the  characteristic  function  of  dFu  (*)  has  a  finite  ra¬ 
dius  of  convergence  [4].  For  finite  symbol  HMM’s,  it  is 
clear  from  (1)  and  (2)  that  dFtJ(x)  =  0  for  *  <  0  and  * 
>  1.  Thus, 

Mij(k,T)=  f  xkdF,j(x)  <  1 

JO 

so  that  all  the  moments  are  finite.  The  series 
00 

<MW)  =  r?0My(r,  T)(iw)'/r\ 

for  the  characteristic  function  of  dFit(x)  is  absolutely 
convergent  with  an  infinite  radius  of  convergence  be¬ 
cause,  for  fixed  w0  *  0,  each  summand  is  bounded  above 
in  magnitude  by  |  w0 1 '  fr\ ,  and  thus  the  radius  of  conver¬ 
gence  must  be  at  least  as  large  as  |  w0 1 .  Consequently,  for 
finite  symbol  HMM’s,  the  moments  of  dFu  (*)  uniquely 
determine  dF,j  (*) .  A  similar  argument  holds  for  contin¬ 
uous  symbol  HMM’s,  provided  the  likelihood  functions 
fi(0T)  are  bounded  on  the  set  of  all  sequences  { 0T } .  In 
this  paper,  we  assume  that  the  likelihood  functions  are 
bounded  because  such  an  assumption  is  not  particularly 
restrictive  for  applications. 

Receiver-operator  characteristic  (ROC)  curves  [3]  are 
commonly  used  in  the  radar  and  sonar  communities  to 
provide  quantitative  assessments  of  the  correct  and  incor¬ 
rect  classification  rates  for  classification  schemes  based 
on  likelihood  functions.  ROC  curves  can  be  used  for  the 
same  purpose  here.  To  develop  a  ROC  curve  for  a  given 
classification-related  test  statistic,  say  q,  under  two  hy¬ 
potheses  Hi  and  Hj ,  the  conditional  pdf’s  (using  the  no¬ 
tation  in  [3]) 

Pq\HXQ\Hi)  and  p^Ht(Q\Hj) 


-260- 


612 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  38.  NO  4.  APRIL  1990 


that  define  the  test  statistic  q  under  the  different  hy¬ 
potheses  H,  and  HJt  respectively,  must  be  known.  For 
each  real  number  u,  —  oo  <  u  <  +oo,  we  define  the  prob¬ 
ability 

PF(u)  =  pqlHXQ\H,)dQ 
and  the  probability 

Pd(u)  =  (  P,m(Q\Hj)dQ. 

Ou 

The  ROC  curve  for  q  is  simply  the  locus  of  points 
(PF(u) ,  PD(u))  parameterized  by  u.  The  parameter  u  is 
usually  treated  as  a  decision  threshold  in  applications. 
Suppose  the  decision  threshold  uthrcsh  is  selected.  Then  if 
q  >  thresh,  the  classifier  decides  Hj.  The  probability  of 
this  decision  being  correct  is  PD,  and  the  probability  that 
it  is  incorrect  is  PF .  PF  and  1  -  PD  are  usually  referred 
to  as  the  false  alarm  and  false  dismissal  probabilities,  re¬ 
spectively.  Analogous  remarks  pertain  if  q  <  u,hresh.  Note 
that  the  ROC  curve  for  u  =  -  oo  goes  through  the  point 
(1,1)  and  for  u  =  +oo  it  goes  through  the  point  (0,  0) . 

The  ROC  curve  of  the  optimum  classifier  depicted  in 
Fig.  1,  under  the  hypotheses  0Te  HMM(i')  and  Ort 
HMM(j ) ,  is  determined  for  the  likelihood  ratio  test  sta¬ 
tistic 

<?op.  =fj(0T)/f(0T)- 

The  required  conditional  pdf  for  qofl,  is  defined  by  the  cdf 

L„{x)  =  E//(r-  {j5(0r)//(0r)})/;(07). 

No  recursion  forZ,,y  (jc)  is  known,  so  the  only  way  to  eval¬ 
uate  it  is  by  doing  the  summation;  however,  this  is  im¬ 
practical  because  the  number  of  terms  in  the  summation 
grows  exponentially  in  T.  Simulation  is  probably  the  best 
method  for  estimating  the  ROC  curves  for  the  optimal  test 
statistic  q0ft.  In  any  event,  a  decision  threshold  uu  must 
be  set  to  enable  classification  to  proceed.  The  "natural” 
threshold  to  set  is  Uy  =  1  for  all  i  and  j,  for  then  the 
maximum  likelihood  determines  the  classification,  the 
classification  decision  is  unique  (except  for  ties)  and  the 
classifier  depicted  in  Fig.  1  is  obtained.  However,  in  gen¬ 
eral,  it  is  not  necessary  to  make  the  natural  choice.  The 
best  choice  depends  on  the  false  alarm  and  false  dismissal 
requirements  for  each  pair  of  hypotheses  0F(  HMM(i ) 
and  Ort  HMM  (7  )  in  the  application. 

The  ROC  curve  of  the  suboptimal  classifier,  under  the 
hypotheses  HMM(i)  and  HMM(y'),  is  determined  for 
the  test  statistic 

flsubopc  ~  fj(  @T  )  ■ 

The  required  conditional  pdf’s  for  ^subop,  are  given  by 
dFji(x)  and  dFjj(x),  respectively.  As  shown  in  Section 
II,  the  moments  of  dFJt  (x)  and  dFu  (x)  can  be  computed 
to  any  desired  order;  hence,  the  ROC  curve  for  qsubopt  can, 
in  principle,  be  approximated  to  any  required  accuracy 
without  resorting  to  simulation.  A  natural  choice  of  de¬ 


cision  threshold  u:j  for  each  pair  of  hypotheses  Ort 
HMM(i)  and  0Tt  HMM(j')  is  not  available.  Instead, 
the  thresholds  must  be  set  by  direct  examination  of  the 
ROC  curves. 

The  test  statistic  <?subop,  >s  identical  to  <jopt  in  one  impor¬ 
tant  special  case.  If  HMM(i )  is  such  that  f(0T)  in  the 
denominator  of  qopt  is  a  constant  function  of  0T,  then 
<?subopt  can  be  scaled  so  that  <?sul>op,  =  qopt:  A  situation  that 
might  require  such  an  HMM(i)  is  one  in  which  white 
noise  is  being  modeled,  for  then  one  might  anticipate  that 
all  observation  sequences  at  the  output  of  the  preprocessor 
are  equally  likely.  The  classification  statistic  is  more  ap¬ 
propriately  referred  to  as  a  “detection”  statistic  in  this 
instance.  Thus,  a  ROC  curve  for  the  optimum  detection 
statistic  can  be  developed  from  the  moments  computed  by 
the  algorithm  given  in  Section  II. 

The  use  of  ^subopt  in  preference  to  <yopl  is  appropriate 
only  if  the  associated  conditional  pdfs  for  the  ROC  curves 
are  “well  separated”  from  each  other,  and  if  the  appli¬ 
cation  places  great  emphasis  on  control  of  the  false  alarm 
or  false  dismissal  probabilities.  In  this  situation,  both  qop, 
and  qiubopt  are  very  likely  to  perform  well;  however,  es¬ 
timated  ROC  curves  for  <?opt  would  have  to  be  obtained 
from  very  large  simulations,  especially  if  very  small  false 
alarm  probabilities  or  false  dismissal  probabilities  are  re¬ 
quired  in  the  application.  On  the  other  hand,  ROC  curves 
for  ^SUbopi  can  be  obtained  reliably  without  simulation.  In 
any  event,  classification  performance  using  ^5ubopt  should 
bound  the  classification  performance  using  qopl. 

II.  The  Moment  Algorithm 

Every  HMM  is  comprised  of  two  basic  parts:  a  Markov 
chain  and  a  set  of  random  variables.  The  Markov  chain 
has  a  finite  number  of  states,  and  each  state  is  uniquely 
associated  with  one  of  the  random  variables.  The  state 
sequence  generated  by  the  chain  is  not  observable,  i.e., 
the  Markov  chain  is  “hidden.”  At  each  time  t  =  0,  1,2, 
•  •  •  ,  the  Markov  chain  is  assumed  to  be  in  some  state; 
it  transitions  to  another  state  at  time  t  +  1  according  to 
its  transition  probability  matrix.  At  each  time  r,  one  ob¬ 
servation  is  generated  by  the  random  variable  associated 
with  the  state  of  the  Markov  chain  at  time  t.  The  obser¬ 
vations  are  referred  to  as  symbols.  If  the  random  variables 
assume  only  a  finite  set  values,  the  HMM  is  referred  to 
as  a  finite  symbol  HMM.  If  the  random  variables  assume 
a  continuum  of  values,  the  HMM  is  called  a  continuous 
symbol  HMM.  The  full  parameter  set  defining  an  HMM 
is  comprised  of  the  initial  state  probability  density  func¬ 
tion  of  the  Markov  chain  at  time  t  =  0,  the  Markov  chain 
state  transition  probability  matrix,  and  the  pdfs  of  each 
of  the  random  observation  variables. 

The  reader  is  referred  to  [2]  for  further  discussion  of 
HMM’s  and  the  basic  algorithms  related  to  them.  Of  par¬ 
ticular  importance  is  the  forward-backward  algorithm  that 
is  used  extensively  in  this  section.  It  is  not  necessary  to 
read  the  remainder  of  this  section  to  understand  the  ex¬ 
amples  presented  in  Section  III. 


-261- 


STREIT:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM'S 


613 


The  algorithm  is  presented  separately  for  finite  and  con¬ 
tinuous  symbol  HMM’s  in  Section  1I-A  and  II-B,  respec¬ 
tively.  Since  the  presentation  uses  only  the  forward  part 
of  the  forward-backward  algorithm,  the  algorithm  may  be 
named  the  forward  moment  algorithm.  Section  II-C  con¬ 
tains  a  statement  of  the  backward  moment  algorithm  and 
an  identity  that  is  analogous  to  the  Baum  identity  of  the 
usual  forward-backward  algorithm. 

A.  Finite  Symbol  HMM's 

Let  HMM(f)  be  a  hidden  Markov  chain  with  n{v) 
states,  v  =  1,  •  •  •  .  Subscripted  indexes  will  always  be 
v.  ritten  as  functions  of  their  subscripts  ( for  instance,  n(v) 
is  used  instead  of  n,)  to  avoid  the  later  use  of  subscripted 
subscripts.  Let  the  state  transition  probability  matrix  of 
HMM  (v)  be  denoted  as  A"  =  [ a for  i{v),j(v) 
=  1,  •  •  •  ,  n(v).  Let  the  initial  state  probability  vector 
of  HMM(f)  be  denoted  as  it"  =  l*,-,,;]  for  /( v)  =  1, 

•  ■  •  ,  n(v). 

We  first  restrict  attention  to  finite  symbol  HMM’s,  that 
is,  we  suppose  that  every  observation  sequence  0T  = 
1,  •  •  •  ,  T}  is  such  that 

0(t)eV=  {Vu  •••  ,  Vm } 

where  V  is  the  set  of  all  possible  output  symbols  of  the 
preprocessor.  The  true  nature  of  the  symbols  in  V  is  of  no 
importance  here.  HMM’s  assume  that  0(r)  is  a  random 
variable  whose  probability  density  function  depends  on 
the  current  state  of  the  Markov  chain.  Let  the  discrete 
probability  density  function  for  HMM(x)  when  it  is  in 
state  i(  v)  be  denoted  as  B’w  for  i(»)  -  1,  •  •  •  ,  n(v) . 
Thus,  each  B"{r)  is  a  row  vector  of  length  m.  Stacking 
these  row  vectors  gives  the  n  ( v)  x  m  symbol  probability 
matrix 


B"  ~  [#(►)../(*)] 


Lb 


Hr)-1 


Note  that 


^’t*)  (*}<►>)  -  >).](>) 

where  we  define 

b'(,)(0(f))  =  Pr  [0(f)| HMM(f)  and  state  =  f ( v ) ] . 

The  assumption  that  the  training  phase  is  completed  means 
that  the  parameters  HMM ( v )  =  (ity,  A",  B")  are  known. 

For  finite  symbol  HMM’s,  Fit  ( x )  has  a  finite  number 
■•'f  jump  discontinuities.  Let  Xi}  denote  the  set  of  all  values 
of  x  for  which  FfJ  ( x )  is  discontinuous.  Definition  (2)  im¬ 
plies  that  the  discontinuities  of  F;j  (jc)  occur  precisely  at 
the  different  possible  values  of  f(Or).  Define  the  subset 
Si  (jc)  of  the  set  of  all  observation  sequences  { Or  }  by 

S,(t)  =  {0T/(0T)  =  x}. 


The  sets  S,  (jc)  and  (  y)  are  disjoint  if  jc  *  y.  Also,  the 
union  ot  5,  (jc )  over  all  x  in  X,j  is  the  set  {  Ot  }  of  all 
observation  sequences.  Now,  from  definition  (2),  it  fol¬ 
lows  that 


dFij(x)  =  Fy(jr+)  -  Fjj(x-) 

=  S  JAOt).  (4) 

0reS/(jr  • 


Substituting  (4)  into  (3)  gives 
M^k,  T)  =  Ex*  E  fj(0T 

xeXtJ  OrsSid) 

=  S  E  {/(Or^tiOr) 

xeXjj  OreSiix)  1  9 

=  E  Pr  [Or|HMM(i)]*  Pr  [Or|HMM(y)] 


(5) 

where,  in  the  last  equation,  we  have  used  (1).  It  is  clear 
from  (5)  that  MtJ  (k,  T)  *  Mp  (k,  T),  in  general,  for  k 
>  1 .  For  k  =  1 ,  however,  we  have  ( 1 ,  T )  =  Mj,  ( 1 , 
T)  for  all  i,  j,  and  T. 

The  expression  in  (5)  is  computable  directly  from  the 
parameters  of  HMM(i )  and  HMM  O' ) ;  however,  such  a 
calculation  is  not  practical  except  for  small  T  because  the 
computational  effort  increases  exponentially  in  T.  To  see 
this,  note  that  the  forward-backward  algorithm  [2]  cal¬ 
culates  Pr  [ Or  |  HMM(v)]  using  n2(v)T  multiplica¬ 
tions.  Thus,  each  summand  in  (5)  requires  [n(i) 
n(j)]2T2  multiplications.  There  are  mT  different  possi¬ 
ble  observation  sequences  Or  -  t  -  l,  •••,7’} 

because  each  0(r)  can  be  any  one  of  the  m  output  sym¬ 
bols  in  V.  Thus,  direct  calculation  of  (5)  requires  a  total 
of  [  n  ( i )  n  ( j )  ] 2  T2  m  T  multiplications. 

We  now  derive  a  recursion  for  (5)  that  requires  com¬ 
putational  effort  that  grows  only  linearly  with  T.  The  re¬ 
cursion  is  derived  for  a  more  general  expression  that  con¬ 
tains  (5)  as  a  special  case.  Font  =  1,  2,  •  •  •  ,  define 

k 

R(k,  T)  =  E  II  Pr  [Or|HMM(x)l.  (6) 

Ot  t-  I 


The  application  of  (6)  to  compute  moments  is  straightfor¬ 
ward;  for  example,  R(k  +  1,  T)  equals  M2)(it,  T)  when 
HMM (2)  =  •  •  •  =  HMM(Jt  +1).  Note  that  R(k,  T) 
can  be  interpreted  as  a  joint  moment  of  HMM’s,  that  is, 
as  a  joint  moment  of  the  likelihood  functions  f(Or)  of 
the  HMM’s. 

The  derivation  of  the  recursion  for  R(k,  T)  proceeds 
as  follows.  The  forward  recursion  portion  of  the  forwar  1- 
backward  algorithm  gives  the  expression 

*(►) 

PrfOj-lHMMH]  =  E  af(y(F))  (7) 

I 

where,  for  2  <  t  s  T, 


ar(y(F))  = 


m  »•) 

s  «r- 

<(»•)  *  i 


j  (*> 


(0(0) 

(8) 


-262- 


614 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  38.  NO  4,  APRIL  1990 


and 

«I  (j(*))  =  <<.>£'<,)(0(1))-  (9) 

Substitute  (7)  into  (6)  to  obtain 
»(»>  * 

R(k,  T)  =  2  SII  aUj{u)) 

j(»)m  I  Or  I 

ir- 1.  ■  .* 

«(►) 

=  2  MrO'(l).  •••  .;(*))  (10) 

;(»-)=  I 
I.  •  ■  .* 

where  we  define,  for  t  =  1,  •  •  •  ,  T, 

k 

MiO'i i )•  ••  ;(*))  =  2  n  a;(j(vi).  (u) 

One  interpretation  of  nT  is  that  it  equals  R(k,  T)  given 
that  HMM(p)  must  end  in  state  j(v),  v  =  1,  •  •  •  ,  k. 
We  seek  a  recursion  for  nT.  First  suppose  that  2  <  t  < 
T.  Then,  substituting  (8)  into  (11)  gives 

a*,0'(i)<  •  •  •  .;(*)) 


=  2  n 

O,  9  •  I 


«?-i  ('(•'))  “ 


(0(0) 


"(»■) 

=  22 
<(►)—  I  o, 

v  *  I .  ■  ■  ■  ,k 


well  as  the  indexes  i(  v) ,  and  because  of  (1 1),  we  have 

M,0'(1).  •  •  •  .>(*)) 

=  r(;(l),  •••  ,;■(*)) 

"<>•)  r  * 

2  II  a\ !’(,).;(,)  ft,_|(/(l),  •  •  ,  i(k)) 

/(»»)=  I  V  «  1 

lv-  •  • 

(12) 

where 

k 

r(j(i).  •••  ,;(*))  =  2  n  bj(t)(o( o) 

0(t)  *  *  I 
m  k 

=  2  n  (13) 

5  3  I  *>  ■  I 

Note  that  T  is  the  joint  moment  of  the  random  observation 
variables  uniquely  associated  with  the  state  j(v)  of 
HMM(p). 

Equation  (12)  is  the  desired  recursion  for  2  <  t  <  T. 
For  t  =  1 ,  substituting  (9)  into  (11)  gives 

Mi (X  1 ) .  •  ’  •  J(k)) 

= <$>  k a 

-(.4 

* 

=  r(;(l),  •••  ,;(*))  n  t'(>).  (14) 


Let  fe  =  1 .  From  the  definition,  it  is  clear  that  R ( 1 ,  T) 
=  1  for  all  T,  regardless  of  HMM  ( 1 )  because  the  sum  in 
(6)  is  over  ail  0T.  To  check  independently  the  recursion 
(12)— ( 13),  note  that,  from  (13), 


•  n  bjM(o(t))  1 

Because  a'_ , («(«>))  does  not  depend  on  the  last  symbol 
0(t)  in  the  observation  sequence  O,  =  (0(1),  •  •  •  , 
O(i) } ,  we  have 

/*/(/(  l ) ,  •  ■  •  .;(*)) 


r(y(i))  =  JS*;(I)(F1)  =  1,  lsisr 
From  (14),  we  have 

Mi  0(1))  =  */<o- 

Hence,  from  (10),  we  obtain 

Mi) 

*0.  i)  =  s  =  i. 

J(D-  I 

The  recursion  is  verified  for  T  =  1.  For  T  =  2,  from  (12), 
we  have 


STRE1T:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM'S 


613 


so  that,  from  (10), 

mo 

*(1,2)  =  2 


"(i) 

2  x, 

;<D=i  (  i(i)-i 


»<i> 

=  2 

*<  i » - 1 

=  l 


'■(d 


?(l)ai(l)j(l)^ 

Ml) 

0,1(1  )  j">j 


and  the  recursion  is  verified  for  T  =  2. 

The  first  nontrivial  special  case  is  k  =  2.  In  this  case, 
*(2,  T)  is  identically  the  first  moment  Ml2(  1,  T).  From 
(12),  we  have  for  2  £  t  s  T 

m,0(i).;(2)) 

=  r(;(l),;(2)) 

Ml)  M2) 

‘  .(j^,  .(2^,  M, - 1 ( * ( 1 ) ,  ,(2))a/(i)j(i)a?(2).j(2) 


and,  from  (14), 

M.0(l),;(2))  =  r(;(l),j(2))  Ty(i)T>(2) 

where,  from  (13), 

m 

T(j(\),j(2))  =  2  bj0) (ys)bjm(Vs). 


From  (10),  then,  we  have 

Ml)  M2) 

*(2’r)= 


Jomputation  of  R(2,  T)  =  M,2(  1,  T)  is  therefore  not 
excessively  laborious. 

The  evaluation  of  R(k,  T)  using  the  recursion  (12)  is 
properly  broken  into  two  parts.  The  first  is  the  precalcu¬ 
lation  of  r(j(l),  ■  ■  •  ,  j(k))  for  every  possible  value 
of  the  indexes  j(v).  This  requires  (k  -  1 )  m  Nk  multi¬ 
plications  and  Nk  storage  locations,  where 


(15) 


is  the  geometric  mean  of  the  number  of  different  states  in 
the  various  HMM’s  and  is  not  necesarily  an  integer.  If  N 
=  8  and  if  there  are  m  =  16  different  observation  sym¬ 
bols,  then  computing  and  storing  r  for  k  =  16  requires 
262  144  storage  locations  and  2.1  x  107  multiplications. 
Storage  is  clearly  more  crucial  an  issue  than  multiplica¬ 
tions. 

It  is  possible  in  some  cases  to  utilize  the  underlying 
symmetries  of  T  to  reduce  both  storage  and  computational 
effort.  For  example,  if  HMM(2)  =  •  •  •  =  HMM(*  + 
1 ) ,  then 


T(j(l),j(2),  J(k  +  1)) 

=  r(y(l),  o(j(2)),  •  •  •  ,a(j(k+  1)))  (16) 


for  every  permutation  a  of  the  k  integers  j( 2),  •  •  •  ,j(k 
+  1 ) .  The  proof  of  (16)  follows  easily  from  (13)  because 
multiplication  is  commutative.  Thus,  one  only  need  con¬ 
sider  indexes  that  satisfy 

1  s  j(l)  <  n(l)  and 

1  <;  j(2)  £  j( 3)  <  •  •  •  <y(*  +  1)  S  n( 2). 

The  number  of  ordered  index  sets  (j( 2),  •  •  •  ,  j(k  + 
1 ) )  is  equal  to  the  number  of  combinations  of  n  ( 2 )  letters 
taken  it  at  a  time,  when  each  letter  may  be  repeated  any 
number  of  times  up  to  k.  Storage  is  therefore  proportional 
to 

Nk  +  i 

/«<2)(»(2)+  I)  '  "  (™(2)  +  k  —  1)\ 

- IT - j"(1) 

which  is  significantly  smaller  than  the  [«(2)]*n(  1 )  stor¬ 
age  that  would  otherwise  be  necessary.  The  total  multi¬ 
plication  count  is  also  reduced  proportionately. 

Once  T  has  been  computed  and  stored  for  a  given  value 
of  k,  the  recursion  (12)  can  be  computed  for  any  length  T 
of  the  observation  sequence.  For  each  of  the  Nk  sets  of 
indexes  {j(v)}  in  (12),  the  sum  over  all  Nk  indexes 
{ i(  u)  }  must  be  undertaken.  This  sum  appears  to  require 
kNk  multiplications;  however,  by  using  the  nested  form. 


3i(  I)O(I) 


M2) 

2 


i(2) -  I 


M*) 

“ 

** 

-  1  (*(  1 -  ’ 

!(*)“  1 

•  ,  «(*)) 

•  ’  * 

it  is  possible  to  use  approximately 

Nk  +  Nk~'  +  ■  •  •  (V2  +  N  =  (rfz-^)(Nk  -  1) 


instead.  If  lower  order  terms  are  neglected,  computing  one 
iteration  of  (12)  requires  about  N"  multiplications.  For 
an  observation  sequence  of  length  T ,  computing  pT  re¬ 
quires  on  the  order  of  Nn  T  multiplications.  If  N  =  8  and 
T  =  32,  then  2.2  x  1012  multiplications  are  required  for 
k  =  6.  Assuming  a  multiplication  takes  1  ns,  the  calcu¬ 
lation  requires  611  h  and  is  clearly  impractical. 

Significant  reduction  in  computational  effort  is  possible 
in  some  cases  by  utilizing  the  underlying  symmetries  in 
ix,.  For  example,  if  HMM(2)  =  •  •  ■  =  HMM(*  +  1), 
then 

M/(;(i).y(2),  •  •  •  ,j(k  +  1)) 

=  M/(7'( * ) ,  o{j( 2)),  •  •  •  ,  a(j(k  +  1)))  (17) 

for  every  permutation  o  of  the  k  integers  j ( 2 ) ,  •  •  •  ,j(k 
+  1 ) .  The  proof  of  (17)  follows  easily  by  induction  from 
(12)  because  multiplication  is  commutative  and  because 
T  satisfies  the  same  symmetry  property  (16)  in  this  case. 


-264- 


616 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  3*.  NO  4,  APRIL  1990 


Thus,  the  recursion  (12)  need  be  computed  for  only  Nk  + , 
sets  of  indexes.  The  total  multiplication  count  is  reduced 
to  4 N2k  + 1 7,  which  is  significantly  smaller  than  the  Nn  T 
multiplications  that  would  otherwise  be  needed.  For  the 
above  example  requiring  611  h,  if  N  =  n(  1)  =  n( 2)  = 
8  and  if  the  symmetry  (17)  is  utilized,  the  calculation 
would  be  reduced  to  roughly  a  96  min  calculation.  Utiliz¬ 
ing  symmetry  is  clearly  significant  in  that  it  can  tum  an 
impractical  long  calculation  into  a  feasible  shorter  one. 

Underflow  is  potentially  a  problem  when  the  recursion 
(12)  is  computed.  It  can  be  easily  overcome  in  exactly  the 
same  manner  as  pointed  out  in  [2]  for  preventing  numer¬ 
ical  underflow  during  the  calculation  of  the  forward- 
backward  algorithm.  Specifically,  let  n,  be  computed  ac¬ 
cording  to  (12)  and  then  multiplied  by  a  scale  factor  c, 
defined  by 


c,  = 


<■(►) 

E  M,0'(  1), 

J(*)m  I 


JW) 


(18) 


Then  use  the  scaled  values  of  n,  in  the  recursion  (12)  to 
compute  ii,  + ,,  which  is  in  tum  scaled  as  shown  in  (18). 
If  we  continue  in  this  fashion  and  recall  the  expression  in 
(10),  it  follows  that 


*(*,  T)  = 


(19) 


Because  the  product  cannot  be  evaluated  without  under¬ 
flow,  we  compute  instead 

r 


log  R(k ,  7)  =  -  £  log  c,.  (20) 

/-i 


tion  of  the  random  variable  O(t).  The  posterior  likeli¬ 
hood  function  f,(Or)  is  a  probability  density  function  for 
continuous  symbol  HMM’s,  as  opposed  to  a  simple  prob¬ 
ability  [see  (1)]  for  discrete  symbol  HMM’s.  Thus,  for 
real  vectors  a  and  $  with  a  <  <?,  we  have 

\^f,(Or)dOT  =  Pr[SsOr£  £|HMM(p)] 

(22) 

where  dOT  =  dxt  ■  ■  •  dxT. 

For  continuous  symbol  HMM’s,  the  functions  F,j(x) 
are  defined  just  as  in  (2),  but  with  a  7-fold  integral  over 
Ot  replacing  the  7-fold  sum  over  0T.  Thus,  we  have  the 
differential 


6(x -f(0T))fj(0T)d0Tdx 

JOt 

where  5(  • )  denotes  the  Dirac  delta  function.  From  (3), 
the  moments  are  given  by 

=  L**  j o  &{x  -  f(0T))fj(0T)d0Tdx 
=  \  fj{Or)  (  xk8(x -f{0T)dxdOT 

JOt  J-od 

=  (  {f(0T)}kfj(0T)dOT  (23) 

JOt 


Any  convenient  scale  factor  can  be  used  instead  of  (18). 
A  potentially  useful  one  might  be  to  take  c,  =  Nk.  Using 
c,  would  eliminate  the  effort  of  computing  the  sum  in  (18) 
before  scaling. 

B.  Continuous  Symbol  HMM's 

The  objective  of  this  section  is  to  show  that  the  moment 
algorithm  for  discrete  symbol  HMM’s  can  be  carried  over 
essentially  unchanged  to  continuous  symbol  HMM’s.  In 
fact,  it  holds  also  for  continuous  vector  symbol  HMM’s; 
however,  only  the  continuous  symbol  HMM’s  are  treated 
here  for  simplicity. 

Throughout  this  section,  it  is  assumed  that  each  output 
symbol  O(t)  is  a  real  random  variable  defined  on  some 
underlying  event  space  V.  The  probability  density  func¬ 
tion  of  0(f)  is  uniquely  defined  for  each  state  i(v)  =  1, 
•  •  •  .  n(u)  of  each  HMM(k),  v  =  1,  2,  •  •  •  ,  and  is 
denoted  as  b’(r)  (x).  Thus,  for  real  numbers  a  and  0  with 
a  <  1 3,  we  have 

J  b’it)(x)dx  =  Pr[«  s  O(t)  s  0|HMM(r) 

and  state  =  /(r)].  (21) 

An  observation  sequence  Ot  =  {x„  t  =  1,2,  •  •  •  ,7} 
is  a  sequence  of  real  numbers  x„  with  x,  being  a  realiza¬ 


which  is  the  continuous  analog  of  (5).  It  is  clear  from  (23) 
that  (.k,  7)  =  MJt  (k,  7)  in  general  only  for  the  special 
case  k  =  1 .  The  analog  of  (6)  for  continuous  symbol 
HMM’s  is 

So.  -»  t 

•••  n  MOT)dOT.  (24) 

—  OO  J  — OO  »»  “  I 


r-foid 

The  forward-backward  algorithm  for  computing  the 
posterior  likelihood  function  for  continuous  symbol 
HMM’s  is  modified  [5]  as  follows: 

n(t) 

f.(0T)  =  £  <x’T(j(»))  (25) 

/(’)- 1 

where  £*7-  (y ( *»))  is  computed  exactly  as  given  by  the  re¬ 
cursion  (8)  and  (9),  with  the  only  difference  being  that 
bj{t)  (0(f))  in  (8)  is  now  interpreted  as  the  probability 
density  function  implicit  in  (21).  Consequently,  (10)  still 
holds  exactly  if  we  define 

MX1)*  •  ’  •  -X*)) 

=  (  [  n  a'(j(y))dO,  (26) 

J  -00  J  -00  »*■  I 


/-fold 


-265- 


617 


1TREIT:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM'S 

as  the  analog  of  (11).  Proceeding  as  before  with  r-fold 
integrals  replacing  r-fold  summations  gives  exactly  the  re¬ 
cursion  ( 12).  but  with  the  one-dimensional  integral 

JOo  k 

n  b’M(x)  dx  (27) 

—  00  V  *  I 

in  place  of  (13). 

The  remarks  in  the  preceding  section  concerning  stor¬ 
age.  multiplication  counts,  and  symmetry  properties  all 
apply  for  continuous  symbol  HMM’s.  The  primary  dif¬ 
ference  is  that  (27)  requires  an  integral  evaluation  instead 
( •;  .i  rintt  :  sum  as  in  (13).  This  evaluation  increases  the 
initial  computational  overhead,  but  once  (27)  is  com¬ 
puted,  the  algorithm  (12)  proceeds  exactly  as  before. 

C  The  Forward-Backward  Moment  Algorithm 

l  ne  moment  algorithm  presented  above  in  this  section 
used  the  forward  probabilities  defined  by  (8)— (9).  It  is 
equally  feasible  to  use  the  backward  probabilities  for  the 
same  purpose.  They  are  defined  by 

0T(j(f))  =  1 

and,  for  T  —  1  >  /  >  1,  by 
"<») 

0, {]{»))  =  2  +  1))*3,+ 1(/(*'))- 

i{»)  *  I 

The  backward  moment  algorithm  computes,  for  1  <  t  < 
T  -  1 ,  the  function 

7,00).  •  •  •  .;(*))  =  £  n  0, 00)) 

where  0,  =  {0(t  +  1),  •  •  •  ,  0(7")}.  The  backward 
recursion  is  given  by 

TrO( * ) •  •  •  ‘  J(*))  ~  1 

and,  for  T  -  1  >  t  >  1 ,  by 

7,00).  •  •  •  .;(*)) 

«(*)  r-  k 

=  , ,  .  n  aj(,)J{y) 

l(l»)  =  1  (_  V  =  I 

r=l.  -.A 

•  r,+1(;(i).  ••• .  i(*))r(/( i),  •••  ./(*)). 

The  derivation  of  this  recursion  is  similar  to  that  of  (12)— 
(13). 

It  is  straightforward  to  show  that  for  any  r,  1  <  t  s  T. 

"I**) 

R(k.T)=  2  M,(y(l).  ••  •  .;(*)) 

J(r)  *  I 

•  7, ( y ( 1 ) .  •  •  ■  ,j(k)). 

Note  that  the  case  t  =  7"  is  (10).  This  identity  is  the  analog 
for  R(k,  T)  of  the  well-known  Baum  identity  [2]  for  like¬ 
lihood  functions,  i.e., 

«(») 

f,{Or )  =  2  ot,(i(v))0,(i(i>)). 

Kf)  *  1 


III.  Comparison  of  Theoretical  Moments  to 
Simulation 

Ergodic  Markov  chains  are  those  for  which  it  is  possi¬ 
ble  to  transition  from  every  state  to  every  other  state,  al¬ 
though  not  necessarily  in  one  step.  Left-to-right  Markov 
chains  are  those  for  which  transitions  to  lower  numbered 
states  are  not  allowed,  that  is,  have  probability  zero. 
These  two  types  of  chains  are  sufficiently  different  that 
they  are  considered  separately  in  the  examples. 

One  interpretation  is  that  ergodic  HMM’s  are  models 
of  quasi-stationary  signals,  while  left-to-right  HMM’s  are 
models  of  transient  signals  that  ultimately  become  sta¬ 
tionary  (because  the  highest  numbered  state  is  not  exited 
once  it  is  entered).  One  might  therefore  expect  these  two 
types  of  HMM’s  to  affect  classification  performance  in 
different  ways.  The  three  examples  given  in  this  section 
support  this  expectation. 

Using  the  above  interpretation,  the  examples  may  be 
described  as  follows  The  first  example  show  s  that  clas¬ 
sification  using  the  suboptim.il  statistic  ^iuocp,  reliably  dis¬ 
tinguishes  between  sufficiently  long  quasi-stationary  sig¬ 
nals  with  a  reasonable  amount  or  computational  effort. 
The  second  example  shows  that  short  quasi-stationary  and 
transient  signals  look  significantly  different  to  the  HMM 
transient  recognizer,  but  not  io  the  HMM  recognizer  based 
on  the  quasi-stationary  signal  The  third  example  shows 
that  noisy  observations  of  transient  signals  adversely  af¬ 
fect  classification  performance  by  making  the  transient 
signal  appear  to  have  a  stationary  component,  which  is 
then  misdassified  by  the  HMM  transient  recognizer. 

A.  Two  Ergodic  HMM's 

HMM(l)  and  HMM (2)  are  five-state,  eight-symbol 
ergodic  models  whose  parameters  are  given  (rounded  to 
three  significant  decimals!  in  Tables  1  and  II.  respec¬ 
tively.  HMM  ( 1  )  clearly  generates  observation  sequences 
of  uniformly  distributed  symbols.  HMM  ( 2 )  is  more  com¬ 
plex  in  structure,  but  every  symbol  can  be  generated  in 
every  state.  The  fundamental  question  of  interest  here  is 
the  following.  How  long  must  an  observation  sequence 
be  to  guarantee  that  the  suboptimal  classification  statistic 
<?sut»pi 's  highly  reliable  (say.  99%  correct)  and  has  a  low 
false  dismissal  rate  (say.  of  0.5% )  J  We  will  give  what 
may  best  be  described  as  a  semiempincai  answer  to  this 
question 

Because  of  the  nature  of  HMM  ( 1  ).  it  is  easy  to  see  that 

/,(0T)  =  Prf  07 1  HMM(  1 )]  =  rr 

In  other  words,  the  posterior  likelihood  function  based  on 
HMM  ( 1 )  is  constant  because  all  observation  sequences 
are  equally  likely  if  Ore  HMM  ( 1 ).  In  particular. /,( 0T ) 
cannot  distinguish  Or  e  HMM  ( 1 )  from  0T  e  HMM  ( 2 ) 
and  thus  is  useless  for  classification. 

The  posterior  likelihood  function  based  on  HMM  (2), 
instead  of  HMM(l),  is  useful  for  classification  Ten- 
thousand  observation  sequences  0T  of  each  HMM  were 
generated,  and  the  posterior  likelihood  f2 ( Ot  1  was  com- 


-266- 


618 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  38.  NO  4.  APRIL  1990 


TABLE  1 

Parameters  of  HMM  ( I ) 


NUMBER  OF  MARKOV  STATES  -  5 
NUMBER  OF  SYMBOLS  PER  STATE  -  B 
INITIAL  STATE  PROBABILITY  VECTOR: 


2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

TRANSITION  PROBABILITY  MATRIX: 

2.00C-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00C-01 

2.00E-01 

2.00E-01 

2. 00  E -01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

2.00E-01 

SYMBOL  PROBABILITY  MATRIX  (TRANSPOSED): 

1.2SE-01 

1 -25E-01 

1.25E-01 

1.25E-01 

1 .2SE-01 

1.25E-01 

1.25E-01 

1.25E-01 

1 .25E-01 

1.25E-01 

1.25E-01 

I-25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

125E-01 

1.25E-01 

1.25E-01 

1 . 25E— 01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1 .25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

1.25E-01 

TABLE  II 

Parameters  of  HMM  ( 2 ),  Rounded  to  Three  Significant  Digits 


NUMBER  OF  MARKOV  STATES  -  S 
NUMBER  OF  SYMBOLS  PER  STATE  *  B 
INITIAL  STATE  PROBABILITY  VECTOR: 


1.00E+00 

0.00 £+00 

O.OOE+OO 

O.OOE+OO 

O.OOE+OO 

TRANSITION  PROBABILITY  MATRIX: 

1 .40E-OI 

2.35C-01 

3 .081-01 

1.24E-01 

1 .94E-01 

1 .40E-01 

1.14E-01 

2.99E-01 

2.13E-01 

2.34E-01 

4.37E-02 

3.201-01 

1.72E-01 

1.27E-01 

3.38E-01 

9.73E-02 

4.97E-01 

1.53E-02 

1.15E-01 

2. 7 St-01 

2.36E-01 

2.49E-02 

4.27E-01 

2 -82E-01 

2.98E-02 

SYMBOL  PROBABILITY  MATRIX  (TRANSPOSED): 

1.B1E-01 

1.22E-01 

7.89E-03 

1.48E-01 

7.04E-02 

1 .391-01 

8.28E-02 

3.23E-02 

9.13E-02 

1.33E-OI 

2.67E-02 

1 -60E-01 

5.87E-02 

1.08E-01 

2.34E-01 

1.79E-01 

1 .66E-01 

2.18E-01 

1 . 30E-01 

5.97E-02 

1.56E-01 

1.58E-01 

2.15E-01 

2.09E-01 

2.35E-01 

1 .19E-01 

5.75E-02 

1.11E-01 

1 .02E-01 

1 .03E-01 

1.76E-01 

1.32E-01 

2.40E-01 

6 . 61 E  —02 

1 . 76E-02 

2.37E-02 

1.22E-01 

1 .17E-01 

1 .46E-01 

1.47E-01 

puted  using  the  forward-backward  algorithm.  Fig.  2 
shows  a  histogram  of  the  natural  logarithm  of  dF22(x)  for 
T  =  25.  The  observation  sequences  are  thus  matched  to 
the  posterior  likelihood  function.  Fig.  3  shows  a  histo¬ 
gram  of  log  dF2 1  (or)  for  T  =  25.  In  Fig.  3,  then,  0T  is 
mismatched  to  the  likelihood  function.  As  is  clear  from 
Figs.  2  and  3,  the  difference  between  the  mean  values  of 
the  log  likelihood  functions  is  about  1 .4  standard  devia¬ 
tions.  Thus,  the  potential  exists  for  using  log  dF22{x)  to 
classify  observation  sequences;  however,  T  =  25  is  not 
long  enough  to  classify  with  a  high  probability  of  detec¬ 
tion  (i.e.,  PD)  and  a  low  false  alarm  probability  (i.e., 
Pf)- 

A  useful  observation  drawn  from  Figs.  2  and  3  is  that 
the  probability  density  function  of  log  dF2J  (jc)  is  nicely 
approximated  by  the  normal  distribution.  Let  *i,y  and  <r,y 
denote  the  mean  and  standard  deviation  of  log  dFtj  (x). 
Then,  if  dF,,  (x)  is  log-normal,  it  is  easy  to  show  that 


Fig.  2.  Histogram  of  10  000  values  of  log  dFn(x)  for  T  =  25.  (The  nor¬ 
mal  curve  has  the  sample  mean  and  variance  given  in  Table  III.) 


Fig.  3.  Histogram  of  10  000  values  of  log  dF„  (x)  for  T  »  25.  (The  nor¬ 
mal  curve  has  the  sample  mean  and  variance  given  in  Table  III.) 

and  a,,  are  related  to  the  moments  M{J  (k,  T)  by  the  for¬ 
mulas 

N  =  2  log  T)  —  (1/2)  log  T)  (28) 

ofj  —  log  Mij  (2,  T)  -  2  log  Mg(l,  T).  (29) 

It  is  stressed  that  (28)  and  (29)  hold  exactly  if  and  only  if 
dFij(x)  is  truly  log-normal.  For  finite  symbol  HMM’s, 
dFij  (x)  is  necessarily  discrete,  so  that  both  (28)  and  (29) 
must  be  viewed  as  approximations.  Sufficient  conditions 
under  which  it  may  be  proved  that  dF^  ( x )  is,  in  some 
sense,  approximately  log-normal  are  unknown.  Although 
the  central  limit  theorem  is  surely  responsible  for  this  log¬ 
normal  behavior,  it  is  not  clear  how  to  apply  it  in  this 
setting. 

Table  III  gives  a  comparison  between  the  mean  and 
standard  deviations  of  log  dF2)  (x)  estimated  from  10  000 
observation  sequences  Or  and  those  calculated  from  (28) 
and  (29).  This  table  shows  good  agreement  between  the 
approximations  of  (28)  and  (29)  and  the  sample  means 
and  variances.  It  also  establishes  that  observation  se¬ 
quences  of  length  T  =  400  are  long  enough  to  distinguish 
between  0T  e  HMM(l)  and  Or  e  HMM (2)  with  high 
reliability.  That  is,  the  difference  between  the  mean  value 
of  log  dF2 ,  (x)  and  the  mean  value  of  log  dF22  (x)  is  about 
5.2  standard  deviations.  Assuming  log  dF2 ( (x)  and  log 
dF22(x)  are  normally  distributed,  as  they  appear  to  be, 


-267- 


STREJT:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM  S 


619 


TABLE  111 

Comparison  of  Two  Estimates  for  the  Mean  and  Standard  Deviation 
of  log  dF ^  ( x )  for  j  *  1 ,  2 


T 

_ m«n  Vi'u» _ 

SMplt 

Cq.  20 

s«i Sr 

Eq.  29 

1  - 1 

5 

-10.8 

-10.6 

0.95 

0.71 

10 

-21.3 

-21.2 

1.11 

0.92 

15 

-31.9 

-31.0 

1.24 

1.10 

20 

-42.4 

-42.4 

1.35 

1.25 

25 

-53.0 

-52.9 

1.41 

1.38 

50 

-105.0 

-105.8 

1.93 

1.91 

100 

-211.4 

-211.5 

2.62 

2.67 

200 

-422.6 

-423.0 

3.60 

3.76 

400 

-045.0 

-845.8 

5.09 

5.30 

J  - 1 

5 

-10.1 

-10.1 

0.69 

0.59 

10 

-20.3 

-20.3 

0.90 

0.84 

15 

-30.6 

-30.5 

1  06 

1.03 

20 

-40.8 

-40.8 

1.23 

1.20 

25 

-51.0 

-51.0 

1.37 

1.34 

5Q 

-102.1 

-102.1 

1.92 

1.90 

100 

-204.4 

-204.4 

2.66 

2.69 

200 

-400.9 

-409.0 

3.77 

3.60 

400 

-010.0 

-810.1 

5.33 

5.37 

TABLE  IV 

Parameters  of  HMM  ( 3 ) 


NUMBER  OF  HARKOV  STATES  -  S 


NUMBER  OF  SYMBOLS  PER  STATE  -  B 


INITIAL  STATE  PROBABILITY  VECTOR: 

1 .001*00  O.OOE+OO  O.OOE+OO  O.OOE+OO  O.OOE+OO 

TRANSITION  PROBABILITY  MATRIX: 

6. OOE-Ol  4.00E-01  O.OOE+OO  O.OOE+OO  O.OOE+OO 

O.OOE+OO  7. OOE-Ol  2.00E-01  1. OOE-Ol  O.OOE+OO 

O.OOE+OO  O.OOE+OO  6. OOE-Ol  4. OOE-Ol  O.OOE+OO 

O.OOE+OO  O.OOE+OO  O.OOE+OO  7. OOE-Ol  3. OOE-Ol 

O.OOE+OO  O.OOE+OO  O.OOE+OO  O.OOE+OO  l.OOE+OO 


SYMBOL  PROBABILITY  MATRIX  (TRANSPOSED): 


9. OOE-Ol 
1 .00E-01 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 


1.00E-01 
6. OOE-Ol 
2. OOE-Ol 
1 .OOE-Ol 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 


O.OOE+OO 
O.OOE+OO 
3. OOE-Ol 
6. OOE-Ol 
1 .OOE-Ol 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 


O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
1. OOE-Ol 
2. OOE-Ol 
4. OOE-Ol 
3. OOE-Ol 
O.OOE+OO 


O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
O.OOE+OO 
1 .OOE-Ol 
6. OOE-Ol 
3. OOE-Ol 


then  classification  using  <7subop,  has  a  probability  of  correct 
classification  of  99%  for  a  false  alarm  rate  of  0.5%. 

Computing  the  posterior  likelihood  function  f2(Or)  for 
T  =  400  requires  n2T  =  10  000  multiplications;  thus, 
computational  requirements  for /2  ( O400 )  are  small  enough 
for  practical  application.  Furthermore,  the  forward-back- 
ward  algorithm  for  computing  f7  (  Ot)  is  mathematically 
equivalent  to  a  nested  sequence  of  matrix-vector  multi¬ 
plications.  Consequently,  it  is  possible  to  reduce  total 
computation  time  by  the  design  of  a  “black  box”  to  ex¬ 
ploit  this  special  structure  in  hardware. 

B.  Mixed  Ergodic  and  Left-to-Right  HMM’s 

HMM(3)  is  a  five-state,  eight-symbol  left-to-right 
model  whose  parameters  are  given  in  Table  IV.  It  has  a 
structure  that  might  conceivably  arise  in  the  SIIWR  prob¬ 
lem.  Note  that  HMM (3)  never  leaves  the  fifth  state  once 
it  is  entered.  Consequently,  all  sufficiently  long  observa¬ 
tion  sequences  ultimately  contain  only  the  three  symbols 
F6,  V7,  and  Fg.  Note  also  that  the  symbol  V%  occurs  if  and 
only  if  the  fifth  state  has  been  entered.  It  follows  that  an 
observation  sequence  Ot  containing  the  symbol  Vg  and 
subsequently  containing  any  of  the  five  symbols  Vu  V2, 
Fj,  V+,  or  V}  must  have  posterior  likelihood  zero,  i.e., 
/3(Or)  =  0.  Other  forbidden  symbol  sequences  may  also 
be  noticed.  It  will  be  seen  that  these  facts  make/3(Or)  a 
powerful  discriminator  against  ergodic  observation  se¬ 
quences.  To  summarize  briefly,  this  example  will  show 
that  short  observation  sequences  of  quasi-stationary  and 
transient  HMM’s  look  very  different  to  the  transient  HMM 
recognizer.  On  the  other  hand,  all  observation  sequences 
look  somewhat  alike  to  ergodic  HMM  recognizers. 

When  HMM (3)  enters  its  fifth  state,  it  becomes  sta¬ 
tionary  and,  consequently,  significantly  less  interesting. 
Insight  into  the  length  of  the  transient  portion  of  HMM  ( 3 ) 
observation  sequences  is  gained  by  estimating  the  first 
passage  time  of  HMM (3)  into  its  fifth  state,  that  is,  the 
number  of  transitions  in  the  Markov  chain  before  its  fifth 
state  is  entered.  The  mean  and  variance  of  first  passage 


times  may  be  computed  explicitly  [6];  however,  simula¬ 
tion  was  used  here  instead.  In  10  000  observation  se¬ 
quences  generated  for  HMM  (3),  it  was  found  that  the 
mean  and  standard  deviation  of  the  first  passage  time  was 
10.9  and  4.8,  respectively.  The  least  first  passage  time 
was  three  transitions,  and  the  largest  first  passage  time 
was  43  transitions.  Thus,  observation  sequences  for  prac¬ 
tical  purposes  become  stationary  for  t  a:  50. 

Fig.  4  and  Table  V  clearly  show  that  dFn(x)  is  a  “well- 
behaved”  distribution,  even  though  HMM (3)  is  not  er¬ 
godic.  However,  dF33(j r)  is  not  as  closely  approximated 
by  a  log-normal  distribution  as  are  dF7\ (*)  and  dF17(x), 
as  evidenced  by  the  discrepancy  in  Table  V  between  the 
sample  statistics  and  the  statistics  that  would  hold  if 
dFi3(x)  were  truly  log-normal. 

Ten-thousand  observation  sequences  of  HMM  ( 1 )  and 
HMM  (2)  were  generated  and  the  posterior  likelihood 
fj  ( 0T)  was  computed  using  the  forward-backward  algo¬ 
rithm.  The  observation  sequences  are  thus  mismatched  to 
the  posterior  likelihood  function.  Table  VI  gives  the  num¬ 
ber  of  sequences  for  which  /3(Or)  =  0.  Better  than  99% 
rejection  of  the  simulated  ergodic  HMM  observations  was 
attained  when  T  =  10,  that  is,  when  the  observation  se¬ 
quences  were  about  as  long  as  the  mean  first  passage  time 
of  HMM (3)  into  state  5.  Total  rejection  of  the  10  000 
ergodic  observations  occurred  for  T  =  20. 

The  ability  of  f-$(0T)  to  reject  observations  of  0T  e 
HMM(2)  is  much  more  impressive  than  the  f2(0T)  re¬ 
jection  of  Oj-  e  HMM(3).  The  lack  of  symmetry  Flt  (x) 
*  Fj,  (jc)  is  striking  in  this  instance.  Table  VII  gives  es¬ 
timates  of  the  mean  and  standard  deviation  of  log  dF2i(x), 
and  Fig.  5  is  a  histogram  of  the  case  T  =  25.  The  mean 
values  of  the  10  000  samples  and  those  predicted  by  (28) 
agree  very  well;  however,  dF2 3(x)  is  not  as  well  approx¬ 
imated  by  a  log-normal  as  dF22{x)  and  dFn(x),  as  seen 
from  the  discrepancy  in  the  sample  versus  the  predicted 
standard  deviations.  In  any  event,  it  is  clear  by  comparing 
Table  VII  to  the  lower  half  of  Table  III  that  /2  ( 0r)  cannot 


-268- 


620 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL.  38.  NO.  4.  APRIL  1990 


Fig.  4.  Histogram  of  10  000  values  of  log  dF„(x)  for  T  =  25.  (The  nor¬ 
mal  curve  has  the  sample  mean  and  variance  given  in  Table  V.) 


TABLE  V 

Comparison  of  Two  Estimates  for  the  Mean  and  Standard  Deviation 
of  log  dF„(x) 


T 

_ min  Flint 

Stintirt  Dtv 1 it 1 m 

$Mp1* 

lq.  28 

SMpl* 

Cq.  29 

5 

-  5.6 

-  4.9 

1.92 

1.13 

10 

-12.8 

-11.8 

2.30 

1.91 

15 

-18.6 

-18.2 

2.72 

2.33 

20 

-23.5 

-22.6 

3.22 

2.29 

25 

-28.1 

-26.3 

3.61 

2.15 

so 

-50.5 

-47.2 

4.59 

2.75 

TABLE  VI 

Number  of  Ot  e  HMM  (i )  for  Which/, (Or)  -  0,  i  «  1,2 


T 

H»«<2) 

5 

9389 

9172 

10 

9937 

9918 

15 

9997 

9988 

20 

10000 

10000 

TABLE  VII 

Comparison  of  Two  Estimates  for  the  Mean  and  Standard  Deviation 


OF  log  dF2}{x) 

T 

Flint 

_ _ 

SMpl* 

lq.  28 

SMpl* 

lq.  29 

5 

10.8 

-10.8 

0.51 

0.60 

10 

21.4 

-21.4 

0.91 

0.89 

15 

-  32.0 

-32.0 

1-02 

1.04 

20 

-  42.4 

-42.7 

1.04 

1.15 

25 

-  53.2 

-53.5 

1.05 

1.12 

so 

-106.4 

-106.8 

1.11 

1.43 

reliably  distinguish  Ore  HMM (3)  from  0Te  HMM (2) 
when  T  -  50.  However,  since  the  first  passage  time  of 
HMM  ( 3 )  is  almost  certainly  less  than  T  =  50,  increasing 
the  observation  sequence  length  to  improve  reliability  is 
not  appropriate  if  the  underlying  intent  is  the  classifica¬ 
tion  of  the  transient  portion  of  HMM  (3). 

C.  Left-to-Right  HMM  with  Noise 
In  this  example,  the  effect  of  noise  on  the  reliability  of 
the  qwbopl  classifier  is  assessed  for  the  left-to-right  model 
HMM  (3).  The  right  way  to  study  noise  in  finite  symbol 
HMM's  is  to  add  the  noise  to  the  original  time  series  s(t) 
and  then  analyze  the  particular  preprocessor  under  con¬ 


Fig.  5.  Histogram  of  10  000  values  of  log  dF2i(x)  for  T  »  25.  (The  nor¬ 
mal  curve  has  the  sample  mean  and  variance  given  in  Table  VII.) 


sideration  to  determine  the  noisy  symbol  sequence.  How¬ 
ever,  no  particular  preprocessor  is  proposed  here,  and  so 
we  resort  to  modeling  noise  in  much  the  same  way  that 
Shannon  modeled  noisy  discrete  memoryless  channels  [7]. 
This  approach  can  give  an  indication  of  the  successful 
classification  rate  as  a  function  of  the  probable  number  of 
incorrect  symbols  in  an  observation  sequence,  but  it  can¬ 
not  provide  an  assessment  of  the  effect  of  signal-to-noise 
ratio  on  classification  because  such  an  assessment  re¬ 
quires  knowledge  of  the  preprocessor. 

Denote  by  hkj  the  probability  that  the  observation  sym¬ 
bol  Vk  is  altered  to  symbol  Vj  by  the  noise  mechanism  and 
define  the  m  x  m  noise  probability  matrix  H  =  [hy].  It 
is  assumed  that  H  is  independent  of  the  state  of  the  Mar¬ 
kov  chain  and  of  time  t.  Consequently,  the  output  of  a 
given  HMM  corrupted  by  noise  is  equivalent  to  another 
HMM  that  is  noiseless.  If  X  =  (x.  A,  B)  are  the  param¬ 
eters  of  a  given  HMM  with  noise  matrix  H,  the  parame¬ 
ters  of  the  equivalent  noiseless  HMM  are  X  =  (t.  A, 
BH).  The  proof  is  straightforward:  the  product  b^hy  is 
the  probability  that  the  state  of  the  Markov  chain  is  i  and 
that  symbol  j  is  produced,  given  that  symbol  k  was  the 
output  of  the  given  HMM.  The  sum  over  k  of  gives 
the  component  6#  of  the  equivalent  noiseless  HMM  sym¬ 
bol  probability  matrix  8.  Clearly,  By  equals  the  (r,  j ) 
component  of  the  product  BH,  so  that  6  =  BH. 

The  noise  probability  matrix  H  must  be  row  stochastic, 
that  is,  every  row  sum  must  equal  one.  The  HMM-gen- 
erated  symbol  Vk  is  altered  by  noise  to  one  of  the  available 
symbols,  so  that  row  k  must  sum  to  one. 

Because  H  has  row  sums  equal  to  one,  the  matrix  8  is 
a  valid  symbol  probability  matrix  for  the  equivalent  noise¬ 
less  HMM,  that  is,  each  row  of  8  =  BH  sums  to  one.  We 
have 

%  6'>  =  *?,  b*hk> 
m  m 

m 


=  i. 


-269- 


STREIT:  MOMENTS  OF  MATCHED  AND  MISMATCHED  HMM’S 


621 


The  worst  case  noise  probability  matrix,  denoted  H°, 
has  the  constant  entry  hfj  =  1  /m  for  all  i  and  j.  In  this 
case, 

B  =  S  bithti  =  -  S  b,L  =  — . 
k- 1  v  m  *- 1  m 

Consequently,  H  >  with  noise  probability  matrix  H° 
are  indistinguishable.  In  fact,  H°  makes  all  HMM's  sta¬ 
tistically  equivalent  to  the  ergodic  HMM  ( 1 )  given  in  Ta¬ 
ble  I . 

Let  Pr[F,]  be  the  relative  frequency  of  occurrence  of 
the  symbol  in  observation  sequences  of  length  T  before 
the  addition  of  noise.  Thus,  we  have  EPr[  V,]  =  1.  After 
alteration  by  noise,  the  probability  of  correct  occurrences 
of  V,  in  Ot  is  then  Pr[  V,  ]  hu.  The  probability  that  the  sym¬ 
bol  O(t)  e  Ot  is  correct  is 


TABLE  VIII 

Numrer  of  Or  6  HMM  ( 3)  +  Noise  foe  Which/, ( O,)  «  0  at  Various 
Values  of  Et 


T 

*T 

0.1 

o.6i 

mm 

“  0.QOC1 

S 

2194 

234 

23 

1 

10 

3904 

443 

37 

1 

is 

5  SOS 

4SI 

44 

17 

20 

4425 

904 

103 

13 

2$ 

7443 

1303 

121 

11 

SO 

9443 

2494 

345 

34 

TABLE  IX 

Parameters  of  HMM  (4 ),  Rounded  to  Three  Significant  Digits 


NUH8ER  OF  HARKOV  STATES  •  S 
RIMER  OF  SYHSOIS  PER  STATE  -  8 


m 

Dr  =  2  Pr[F(]h„  (30) 

i  *  I 


and  the  probability  that  O(t)  is  incorrect  is 


ET=l-DT.  (31) 


IRITIAL  STATE  PR0RA8ILITY  VECTOR: 

1 .OOE+OO  O.OOE+OO  O.OOE+OO  O.OOE+OO  O.OOE+OO 

TRANSITION  PR08A8IL1TT  HATRIX: 
t.OOC-OI  4.001-01  O.OOE+OO  O.OOE+OO  O.OOE+OO 

O.OOE+OO  7.00E-01  2.00E-01  1.00E-01  O.OOE+OO 

O.OOE+OO  O.OOE+OO  t.OOE-Ol  4.00E-01  O.OOE+OO 

0 .  OOt +00  O.OOE+OO  O.OOE+OO  7.00E-01  3.00E-01 

O.OOE+OO  O.OOE+OO  O.OOE+OO  O.OOE+OO  l.OOE+OO 


For  the  examples  here,  given  a  specific  value  of  £>,  we 
choose  the  simple  noise  probability  matrix  H  defined  by 


h  =  1  -  Er,  all  i 

h„  =  T.  all  i  *  j.  (32) 

nt  —  i 


SYMBOL  PROBABILITY  MATRIX  (TRANSPOSE!)) : 


0.99E-O1 
1.00C-01 
1 .431-04 
1 .431-04 
1 +43E-04 
1 +43E-04 
1.43E-04 
1.43E-04 


1 +00E-01 
5.991-01 
2.00E-01 
1.00E-01 
1.43E-04 
1.43E-04 
1.43E-04 
1.43E-04 


1.43E-04 
1 .43E-04 
3.00E-01 
5.99E-01 
1.00E-01 
1 +43E-04 
1 +43E-04 
1.43E-04 


1 +43E-04 
1 .43E-04 
1 .43E-04 
1 +00E-01 
2.00E-01 
4.00C-01 
3.00E-01 
1 .43E-04 


1.43E-04 
1 +43E-04 
1.43E-04 
1 .43E-04 
1 +43E-04 
1 +00E-01 
5.99E-01 
3.00E-01 


For  this  choice  of  H,  DT  is  independent  of  the  actual  val¬ 
ues  of  Pr[  F'l,  as  is  clear  from  (30)  and  the  fact  that 
EPr(F,]  =  I- 

Noise  tends  to  make  observations  of  all  HMM’s  look 
like  observations  of  HMM(l),  and  ergodic  observation 
sequences  tend  to  have  forbidden  symbol  sequences  for 
the  left-to-right  HMM (3).  The  first  natural  issue  is  there¬ 
fore  to  determine  how  many  forbidden  symbol  sequences 
occur  as  a  function  of  the  incorrect  symbol  probability  Er. 
Table  VIII  gives  the  results  for  various  values  of  T  and 
Et,  based  on  simulations  of  10  000  observation  se¬ 
quences.  It  shows  that  forbidden  symbol  sequences  are 
less  likely  for  small  T  than  for  large  T.  This  table  also 
shows  that  noisy  observations  of  HMM(3)  do  not  have 
as  high  a  proportion  of  forbidden  symbol  sequences  as 
observations  of  HMM(  1 )  and  HMM  (2),  even  for  £>  = 
10%,  as  can  be  seen  by  comparing  Tables  VI  and  VIII. 
One  may  conclude  from  Table  VIII  that  Er  must  be  small 
and  T  must  be  short  to  minimize  misclassification  due  to 
forbidden  symbol  sequences.  For  instance,  if  T  =  25  and 
Et  =  0.001,  the  false  dismissal  probability  is  apparently 
at  least  1.21%.  Shorter  T,  however,  causes  smaller  shifts 
in  the  statistics  in  the  likelihood  function,  and  thus  in¬ 
creases  the  misclassification  rate.  Consequently,  a  trade¬ 
off  exists  between  short  T  and  long  T. 

The  total  false  dismissal  probability  can  be  expressed 
as  the  sum  of  the  false  dismissal  probability  due  to  for¬ 
bidden  symbol  sequences  and  the  false  dismissal  proba¬ 
bility  due  to  noise-induced  shift  in  the  statistics  of  the 
nonzero  values  of  the  posterior  likelihood  function.  We 


examine  the  total  false  dismissal  probability  for 
HMM(4),  which  is  the  HMM  equivalent  to  HMM (3) 
with  the  noise  matrix  H  given  by  (32)  with  ET  =  0.001. 
The  parameters  of  HMM(4)  are  given  explicitly  in  Table 
IX. 

Denote  by  (x)  the  cumulative  distribution  function 

F«(x)  =  [F*(x)  -  Fj, (0)]/[Fj, (oo)  -  Fy(0)].  (33) 

Ten-thousand  observation  sequences  Ot  were  generated 
from  HMM(4)  for  T  =  25.  As  given  in  Table  VIII,  121 
sequences  resulted  in  zero  posterior  likelihood  function 
values  (that  is,/3(07)  =  0)  and  the  remaining  9879  non¬ 
zero  values  of  f3(0T)  give  the  histogram  shown  in  Fig. 
6.  By  comparison  to  Fig.  4,  it  is  clear  that  no  significant 
difference  between  log  dF^(x)  and  log  dFi3(x)  is  evi¬ 
dent.  Therefore,  the  misclassification  rate  due  to  noise- 
induced  shifts  in  the  statistics  of  dFuix)  is  very  small. 
The  suboptimal  classifier  <?subopt  for  HMM  ( 3 )  thus  gives 
98.8%  correct  classification  and  a  1.2%  false  dismissal 
probability  when  used  with  noisy  observations  character¬ 
ized  by  0T  e  HMM  (4). 

Because  ET  =  0.001  in  this  example,  each  observation 
sequence  02S  has  probability  0.025  of  having  at  least  one 
incorrect  symbol.  Of  10  000  observation  sequences,  the 
expected  numberayith  at  least  one  incorrect  symbol  is  250. 
Nearly  half  ( 121 )  contained  forbidden  symbol  sequences 
and  caused  the  only  significant  misclassification  problem. 
The  other  half  apparently  made  no  contribution  to  the 
probability  of  false  dismissal. 


-270- 


622 


IEEE  TRANSACTIONS  ON  ACOUSTICS.  SPEECH.  AND  SIGNAL  PROCESSING.  VOL  18.  NO  4.  APRIL  1990 


Fig.  6.  Histogram  of  9879  samples  of  log  df%,( x)  for  T  =  25.  (The  nor¬ 
mal  curve  has  the  sample  mean  =  -28.156  and  the  variance  =  3.6167. ) 

It  would  be  desirable  to  be  able  to  compute  the  mo¬ 
ments  of  F®  (jr)  instead  of  Fit  (x).  Alternatively,  it  would 
be  desirable  to  be  able  to  compute  the  amplitude  of  the 
impulse  (delta  function)  in  dF0  (x)  that  seems  to  be  pre¬ 
sent  in  the  left-to-right  HMM’s  considered  here.  In  other 
words,  if  we  write 

dF^x)  =A06(x) +  dF?j(x),  (34) 

then  an  algorithm  to  compute  A0  directly  would  be  worth¬ 
while.  Knowing  A0  and  the  moments  of  Fi}  gives  the  mo¬ 
ments  of  F®  (jc).  However,  developing  such  an  algorithm 
requires  further  work. 

IV.  Concluding  Remarks 

If  the  distribution  dFtj  (x)  is  approximately  log-normal, 
the  first  two  moments  A#y(l,  T)  and  Mtj  (2,  T)  can  be 
used  to  develop  a  continuous  approximation  to  dF^  (jc). 
Simulations  suggest  that  dFtj  (jc)  is  approximately  log¬ 
normal  whenever  HMM(i')  and  HMM(y  )  are  ergodic 
and  nontrivial.  (A  “trivial”  HMM  is  an  HMM  whose 
likelihood  function  f(Or)  is  constant.)  A  proof  of  ap¬ 
proximate  log-normality  that  relies  on  the  central  limit 
theorem  is  not  obvious  in  the  present  context.  If  the  dis¬ 
tribution  dFij(x)  is  not  approximately  log-normal,  the 
higher  order  moments  (k,  T)  are  needed  to  develop 
reasonable  continuous  approximations  to  dF,j(x).  The 
forward-backward  moment  algorithm  presented  in  this 
paper  computes  these  moments  explicitly  from  the  defin¬ 
ing  HMM  parameter  sets. 

The  use  of  the  suboptimal  classification  statistic  <7subop, 
in  preference  to  the  optimum  statistic  qopt  is  probably  not 
appropriate  in  many  speech  applications  because  of  the 
ready  availability  of  both  likelihoods  needed  to  form  the 
likelihood  ratio  qop,.  Unfortunately,  simulations  are  re¬ 
quired  to  determine  the  ROC  curves  for  qopt.  Conse¬ 
quently,  for  applications  that  require  small  incorrect  clas¬ 
sification  probability  and  high  probability  of  correct 
classification,  very  large  simulations  are  necessary  to  con¬ 


fidently  establish  the  required  performance.  An  alterna¬ 
tive  in  this  case  is  to  use  the  suboptimal  statistic  gsubop, 
because  the  ROC  curves  can  be  approximated  in  principle 
to  any  required  accuracy  without  simulations. 

The  suboptimal  statistic  qiubcP,  is  identical  (to  within  a 
constant  scale  factor)  to  the  optimal  statistic  qop,  when  the 
problem  is  more  akin  to  detection  than  to  classification. 
That  is,  if  the  application  is  that  of  distinguishing  the 
presence  of  a  signal  embedded  in  noise  from  the  presence 
of  noise  alone,  and  if  the  HMM  noise  model  is  a  “trivial” 
model  as  defined  above,  then  the  optimal  detection  statis¬ 
tic  and  <?SUboPi  are  identical.  As  a  result,  in  this  case,  the 
moments  of  the  optimal  detection  statistic  can  be  com¬ 
puted  using  the  forward-backward  moment  algorithm,  and 
the  ROC  curves  for  the  optimal  detection  statistic  can  be 
approximated  to  any  required  accuracy. 

References 

(!)  L.  R.  Rabincr,  S.  E.  Levinson,  and  M.  M.  Sondhi.  "On  the  applica¬ 
tion  of  vector  quantization  and  hidden  Markov  models  to  speaker-in¬ 
dependent  isolated  word  recognition."  Bell  Svst.  Tech.  J..  vol.  62,  pp. 
1075-1105,  Apr.  1983. 

[2]  S.  E.  Levinson,  L.  R.  Rabiner.  and  M.  M.  Sondhi.  "An  introduction 
to  the  application  of  the  theory  of  probabilistic  functions  of  a  Markov 
process  to  automatic  speech  recognition,"  Bell  Svsr.  Tech.  J ..  vol.  62. 
pp.  1035-1074.  Apr.  1983. 

[3]  H.  L.  Van  Trees,  Detection,  Estimation,  and  Modulation  Theory ,  Part 
/.  New  York:  Wiley.  1968.  sect.  2.2.2. 

[4]  A.  Papoulis.  Probability,  Random  Variables  and  Stochastic  Processes. 
New  York:  McGraw-Hill.  1965,  p.  158. 

[5]  L.  A.  Liporace,  "Maximum  likelihood  estimation  for  multivariate  ob¬ 
servations  of  Markov  sources."  IEEE  Trans.  Inform.  Theory,  vol. 
IT-28,  pp.  729-734,  Sept.  1982. 

[6]  J.  G.  Kemeny  and  J.  L.  Snell.  Finite  Markov  Chains.  Princeton.  NJ: 
Van  Nostrand.  1960.  ch.  3. 

[7]  C.  E.  Shannon,  "A  mathematical  theory  of  communication."  Bell  Svst. 
Tech.  J.,  vol.  27.  1948. 

[8]  R.  L.  Streit  and  R.  F.  Barrett,  "Frequency  line  tracking  using  hidden 
Markov  models,"  IEEE  Trans.  Acoust. .  Speech,  Signal  Processing. 
this  issue,  pp.  586-598. 


Roy  L.  Streit  (SM'84)  was  bom  in  Guthrie,  OK, 
on  October  14,  1947.  He  received  the  B.  A.  degree 
(with  Honors)  in  mathematics  and  physics  from 
East  Texas  State  University,  Commerce,  in  1968. 
the  M.A.  degree  in  mathematics  from  the  Univer¬ 
sity  of  Missouri,  Columbia,  in  1970.  and  the 
Ph.D.  degree  in  mathematics  from  the  University 
of  Rhode  Island,  Kingston,  in  1978. 

He  was  a  Visiting  Scholar  in  the  Department  of 
Operations  Research,  Stanford  University,  Stan¬ 
ford,  CA,  during  1981-1982,  and  an  Exchange 
Scientist  in  the  Signal  Processing  and  Classification  Group  at  the  Defence 
Science  and  Technology  Organization,  Adelaide.  South  Australia,  from 
1987  to  1989.  He  joined  the  staff  of  the  Naval  Underwater  Systems  Center 
(then  the  Navy  Underwater  Sound  Laboratory),  New  London.  CT,  in  1970. 
He  is  an  Applied  Mathematician  and  has  published  work  in  several  areas, 
including  towed  array  design,  complex  function  approximation,  semi-in¬ 
finite  programming,  and  applications  of  hidden  Markov  models.  His  cur¬ 
rent  interests  include  image  analysis,  tracking  problems,  and  training  al¬ 
gorithms  for  neural  networks. 


-271- 


Connection  Machine  Implementation 
Of  Hidden  Maikov  Models 
For  Frequency  Line  Tracking 


J.  L.  Munoz  and  R.  L.  Streit 


-273- 


Chapter  14 

Connection  Machine  Implementation  of  Hidden 
Markov  Models  for  Frequency  Line  Tracking 

Jose  L.  Munoz*  Roy  L.  Streit* 


1  Introduction 

In  [1]  Streit  and  Barrett  explored  the  utilization  of  Hidden  Markov  Models 
(HMMs)  for  frequency  line  tracking  and  detection.  This  paper  explores  the 
implementation  of  [1]  on  the  Connection  Machine,  a  massively  parallel  SIMD 
machine,  and  its  data  parallel  programming  model.  The  reader  is  referred 
to  [1]  for  specific  details  of  the  HMM  and  its  application  to  frequency  line 
tracking  and  detection.  Frequency  tracking  is  the  estimation  of  frequency 
trajectories  that  are  a  result  of  a  tone  changing  in  frequency  as  a  function 
of  time. 

As  presented  in  [1]  HMM  processing  produces  two  principal  outputs  (1) 
the  Viterbi  track  (a  discrete  track),  and  (2)  the  Mean  Cell  Occupancy  track 
(a  continuous  track).  The  CM  implementation  of  HMM  produces  these  two 
tracks  and  augments  them  with  an  additional  probability  field  display,  not 
previously  explored,  representing  quality  of  the  track  estimate. 

2  Background 

It  is  necessary  to  first  discuss  the  elements  of  HMM  in  order  to  present  its 
particular  implementation  details  on  the  CM.  The  frequency  space  is  divided 
into  gates,  a  contiguous  set  of  FFT  frequency  bins,  with  one  signal  allowed  in 
a  gate.  Input  to  HMM  is  .  ;  quence  of  measurements,  z[t\,  over  time  within 
a  gate,  where  a  mea  epresents  an  estimated  frequency  state.  A 

measurement  at  freque.o  ■  ~te  «  is  said  to  exist  if  the  magnitude  of  the 
signal  at  frequency  bin  > ;  ater  than  a  predetermined  threshold  and  is 
greater  than  the  magnitud>  />  other  frequency  bins.  Such  a  measurement 
is  said  to  be  in  the  HMM  e.  If  no  magnitude  within  the  gate  meets 
these  criteria  then  the  measurement  is  said  to  be  in  the  HMM  zero  state. 
A  batch  of  size  T  consists  of  a  sequence  of  these  measurements,  Z,  over  T 

‘Naval  Underwater  Systems  Center,  New  London,  CT  06320,  e-mail: 
munozQnusc.arpa. 


204 


-275- 


Connection  Machine  Implementation  of  Hidden  Markov  Models  205 


remain  in 
zero  states 

track 
terminate/ 


State  transitions 
Remain  in  current  cell 
A  Matrix 


Measurement  in 
incorrect  cell 
|  Measurement  in 
■*  correct  cell 

B  Matrix 


Figure  1 :  Characteristics  of  A  and  B  matrices. 


consecutive  FFT  samples,  i.e.,  Z  =  z[0],z[l],.z[2], . .  ,,z[T  —  1]  where  ^[0] 
represents  the  oldest  measurement  and  z[T  -  1]  the  current  measurement. 

Other  key  elements  of  HMM  consist  of  an  A  matrix  and  a  B  matrix  each 
of  size  (n  +  1)  *  (n  +  1),  where  n  represents  the  number  of  HMM  states 
(the  one  is  required  to  include  the  HMM  zero  state).  The  A  matrix  is  the 
transition  probability  matrix  of  a  Markov  chain  that  represents  the  likely 
extent  of  the  frequency  fluctuations,  track  initiation  and  track  termination. 
Element  a[i,y]  is  the  likelihood  that  the  signal  will  be  in  state  j  at  time 
t  +  1  given  that  it  is  in  state  i  at  time  t,  where  i  and  j  =  0, . .  .,ra.  The  B 
matrix  represents  the  connection  between  the  measurement  at  time  t  and 
the  underlying  state  at  time  t.  Element  b[i,j]  is  the  likelihood  that  the 
measurement  was  made  at  state  j  given  that  the  signal  was  actually  in  state 
i,  where  i  and  j  =  0, . . . ,  n.  Finally,  an  initiation  vector  x  represents  the 
signal  likelihood  at  time  0.  Element  x[j]  is  the  probability  that  the  Markov 
chain  is  at  state  j  at  time  t  =  0  where  j  =  0, . . . ,  n.  The  x  vector  is  typically 
an  A  row.  It  is  via  the  x  vector  that  one  batch  of  data  is  associated  with  the 
next  batch  of  data.  A  x  vector  for  the  new  batch  is  based  on  information 
obtained  from  the  previous  batch.  Characteristics  of  the  A  and  B  matrices 
can  be  seen  in  Figure  1.  Details  on  calculating  A  and  B  can  be  found  in  [1]. 
The  B  matrix  is  a  function  of  the  signal-to-noise  ratio  (SNR),  and  therefore 
a  particular  HMM  frequency  tracker  is  SNR  specific. 

The  CM  implementation  has  two  principal  sections:  (1)  initialization  and 
(2)  processing.  In  the  initialization  section,  processing  and  display  geome¬ 
tries  are  defined  and  the  A  and  B  matrices  are  calculated  and  loaded  into 


-276- 


206 


Jose  L.  Munoz  and  Roy  L.  Streit 


the  CM.  The  processing  section  is  divided  into  two  parts,  one  for  Viterbi  and 
the  other  for  Mean  Cell  Occupancy  (MCO).  They  share  a  common  display 
and  data  generation  section. 


3  Approach  and  Observations 

The  CM  is  a  data  parallel  architecture  and  consequently  performs  best  when 
it  can  work  on  massive  amounts  of  data  in  parallel.  The  approach,  therefore, 
attempts  to  maximize  the  potential  for  data  parallelization. 

Because  the  gates  are  independent  of  each  other,  parallelization  among 
the  gates  is  straightforward  and  such  a  parallelization  would  work  just  as  well 
in  an  MIMD  architecture.  The  opportunity  for  data  parallelization  exists 
only  within  the  gate  and  among  the  various  time  steps.  Both  the  Viterbi  and 
MCO  are  dynamic  programming  models  with  a  significant  sequential  depen¬ 
dence  along  the  time  domain.  Consequently,  it  is  not  possible  to  capitalize 
on  any  parallelization  in  the  time  domain  with  the  algorithms  as  described 
in  [1].  Therefore,  the  initial  effort  focused  on  parallelizing  data  at  each  time 
instant  within  a  gate. 

This  was  achieved  by  replicating  data  within  a  segment ,  where  a  segment 
consisted  of  the  gated  data  plus  the  zero  state.  This  provided  the  ability  to 
take  advantage  of  vector  multiplications  and/or  additions  wherever  possible. 
To  exploit  the  parallel  computation  a  transposed  version  of  the  A  matrix  was 
required,  AT ,  in  order  to  facilitate  working  with  the  n  vector  and  with  (10). 
Hence,  the  7r  vector  is  obtained  from  columns  of  AT. 

Because  send  communication  is  about  twice  as  fast  as  get,  every  attempt 
was  made  to  avoid  a  get  operation  (indeed,  none  are  used).  Strict  interpre¬ 
tation  of  the  algorithms  imply  a  get  operation,  either  getting  a  value  from  a 
past  or  future  time  step.  Therefore,  it  was  found  necessary  to  implement  an 
equivalent  form  of  the  backward  recursions  so  that  the  data  for  the  previ¬ 
ous  iteration  (i.e.,  past  time  step)  was  calculated  from  the  current  time  step 
(details  may  be  found  in  the  algorithm  description). 

The  CM  supports  two  types  of  send  operations:  NEWS  (or  nearest- 
neighbor)  sends  and  general  communication  (or  remote)  sends.  For  the  vir¬ 
tual  processor  (VP)  ratios  and  data  element  sizes  typically  encountered  in 
the  application,  the  performance  timings  of  remote  sends  to  NEWS  sends 
was  found  to  be  2:1.  Therefore,  NEWS  send  is  the  preferred  communication 
method.  However,  information  from  any  particular  time  step  to  its  “past”  or 
“future”  neighbor  nominally  requires  remote  communication  since  we  do  not 
assume  any  a  priori  relationship  between  the  measurements  z[t]  and  z[t  +  1] 
or  z[t  -  1].  In  order  to  replace  the  remote  send  with  a  NEWS  send,  the  data 
is  first  replicated  everywhere  within  the  segment  and  then  a  NEWS  send  is 
executed.  Data  replication  was  accomplished  via  scan.with.add  “upward” 


-277- 


Connection  Machine  Implementation  of  Hidden  Markov  Models  207 


followed  by  scan.with.copy  “downward”  operations  within  the  defined  seg¬ 
ments.  Note  that  the  remote  forms  of  the  send  would  have  required  (1) 
obtaining  the  state  in  the  “future”  or  “past”  time  frame  which  is  to  receive 
the  data,  (2)  either  calculating  the  send  address  in  situ  or  from  a  predefined 
table  of  entries  (via  aref),  and  (3)  then  performing  the  send  operation.  Ex¬ 
ecution  timings  of  each  of  these  approaches  verified,  although  by  a  narrow 
margin,  that  the  replicate/NEWS  send  is  the  preferred  method  over  the  re¬ 
mote  send.  Once  the  data  is  transferred  to  the  appropriate  time  frame  it  is 
guaranteed  that  the  correct  state  has  immediate  local  access  to  the  data  since 
all  states  have  identical  copies  of  the  data.  This  made  the  implementation 
of  (2),  (3),  (5),  (8)  and  (10)  straightforward. 


3.1  Geometry  and  Initialization 

A  3-D  geometry  is  used  for  the  processing,  as  shown  in  Figures  2a  and  2b. 
The  x-axis  performs  double  duty;  both  frequency  and  state  information  is 
maintained  along  the  x-axis.  The  y-axis  represents  time,  with  the  oldest 
data  existing  at  y  =  0  and  the  newest  data  at  y  =  (T  —  1)  where  T  is  the 
batch  size  (i.e.,  number  of  FFT  samples  to  be  processed  as  a  unit).  The  z- 
axis  represents,  along  with  the  x-axis,  state  space.  The  size  of  the  geometry 
is: 


([frequency -ffrequency/(n.states+  l)])*([T])*([n-states+l  ])/2 

where  the  square  brackets  represent  the  smallest  power  of  two  that  is  greater 
than  or  equal  to  the  value  being  evaluated.  The  value  n.states  is  the  size 
of  the  gate  used  for  the  HMM  processing  with  one  added  to  account  for 
the  zero  state.  The  frequency  axis  is  augmented  by  one  zero  state  per  gate 
(represented  by  the  frequency/(n_states+l)  term).  Finally,  the  division  by 
2  is  required  to  account  for  the  FFT  process. 

The  z-axis  is  loaded,  from  the  front-end,  at  y  =  0  with  copies  of  the  A, 
B,  and  AT  matrices  that  have  been  previously  calculated,  with  each  segment 
getting  the  same  copy.  Once  loaded  the  natural  log  equivalent  of  the  matrix  is 
evaluated  to  facilitate  Viterbi  processing  and  an  initial  n  vector  is  obtained 
as  column  zero  of  AT.  All  of  the  matrices  are  loaded  such  that  rows  are 
along  the  x-axis  and  columns  are  along  the  z-axis.  Finally,  all  of  this  data  is 
replicated  along  the  y-axis  via  a  spread  operation.  The  x-axis  is  segmented 
into  size  of  (nstate+l)  with  the  frequency  information  loaded  into  the  cells 
(no  frequency  data  is  entered  at  the  zero  state  cells).  Previously  evaluated 
measurements  are  first  scrolled  “up,”  i.e.,  towards  older  time  with  the  oldest 
time  information  removed  from  the  batch.  New  FFT  data  is  inserted  at 
T  —  1  from  the  front-end  host  and  a  measurement  is  performed  at  each  gate 
in  parallel  with  the  measurement  subsequently  copied  along  the  z-axis. 


Jose  L.  Munoz  and  Roy  L.  Streit 


Figure  2a:  X-Y  plane  of  HMM  compute  geometry. 


frequency  ( i.e.  slate) 


X  axis 


••• 


Figure  2b:  X-Z  plane  of  HMM  compute  geometry. 


79- 


-2 


Connection  Machine  Implementation  of  Hidden  Markov  Models  209 

4  Processing 

4.1  Viterbi  Processing 

4.1.1  Viterbi  forward  processing 

Calculation  of  the  Viterbi  track  as  presented  in  [1]  begins  with  the  following 
algorithm: 


Forward  Viterbi :  z[t],  measurement  at  timet 
t  =  0  :  =  In ?rl>']  +  ln*j>', ^[0]]  j  =  0,...,n  (1) 

$[0,i]  arbitrary 

t  =  1,...,T-  1 : 

6[t,j)  =  \nb[j,z[t}]+  max{£[t  -  1,*]  +lna[i,y]}  i  =  0,...,n  (2) 

=  argmax{£[f- l,t]  +  lna[t,jf]}  t  =  0,...,n  (3) 

where  argmax  returns  the  smallest  index  for  the  maximum  attained.  The 
natural  logarithm  is  used  to  control  potential  numerical  underflow. 

From  (1)  through  (3)  we  see  that  Viterbi  forward  processing  starts  at 
time  t  =  0  (i.e.,  y  -  0),  selects  a  column  of  Inf?  as  identified  by  z[0]  and 
adds  the  x  vector  to  that  column  resulting  in  This  6  is  first  repli¬ 

cated  and  then  sent  down  to  time  =  1  into  the  CM  field  prev.delta.  For 
time  >  0,  the  prev.delta  vector  is  added  to  each  column  of  InA.  A  maxi¬ 
mum  is  evaluated  along  the  z-axis  (columns)  as  well  as  the  smallest  index 
at  which  the  maximum  occurred  (argmax  function)  is  obtained.  This  row 
vector  of  maximums  is  transposed  into  a  column  vector  and  replicated  ev¬ 
erywhere  in  the  segment.  z[t\  is  used  to  select  a  column  from  In 5  and  the 
vector  of  maximums  is  added  to  it,  creating  a  6[t,j]  for  this  time  step  which 
is  subsequently  replicated  within  the  segment.  The  results  of  the  argmax 
evaluation  are  stored  in  vector  ^  for  use  by  the  backward  processing.  The 
6[t,j]  just  calculated  is  then  sent  down  to  the  next  time  step  (more  recent) 
into  its  prev .delta  value  and  the  process  is  repeated  for  T  —  1  time  steps. 

4.1.2  Viterbi  backward  processing 

The  following  algorithm  is  used  for  Viterbi  backward  processing: 


Backward  Viterbi : 

t  =  T  -  1  :  V[T  -  1]  =  argmax{6[T  -  1,  j]},  (4) 

t<T-  1:  VrW  =  *[«+l,Vl*+lJ],  t  =  (T-2),...,0.  (5) 

At  this  point  and  the  V  vector  are  defined  for  all  time  steps.  The 
Viterbi  backward  processing  begins  at  time  =  (T  -  1)  by  determining  the 


-280- 


210 


Jose  L.  Munoz  and  Roy  L.  Streit 


index  at  which  6[f,  j]  is  a  maximum  resulting  in  V[T  -  1].  Once  initialized  it 
is  possible  to  calculate  V  for  the  previous  time  steps  from  the  current  time 
step  by  rewriting  (5)  as: 


V(l-l]=*[(,V'[(]].  (6) 

Consequently,  it  is  only  necessary  to  index  into  the  current  ^  vector 
using  the  current  V  to  obtain  the  V  for  the  previous  time  step.  The  value 
obtained  is  sent  up  to  the  V  field  of  the  previous  time  step.  This  process  is 
repeated  T  —  1  time  steps.  The  Viterbi  track  estimate  for  the  batch  of  size 
T  is  taken  to  be  V  at  time  =  T/2. 

4.1.3  Display  and  new  data 

Once  the  Viterbi  track  has  been  evaluated  the  V  value  at  time  =  T/2  is 
displayed  on  the  CM  frame  buffer  in  a  waterfall  display,  i.e.,  newest  data 
at  the  top  with  the  oldest  data  “falling  off  of  the  display”  at  the  bottom. 
Each  HMM  gate  is  associated  with  a  particular  color.  (More  generally,  the 
V  value  at  any  particular  time  could  be  selected  for  display;  time  =  T/2  was 
chosen  on  the  basis  of  estimated  track  variance.) 

Following  data  display  a  new  p  vector  is  selected  to  associate  the  next 
batch  with  the  current  batch.  The  V  field  at  time  =  0  is  used  to  index  into 
the  At  matrix  to  select  the  appropriate  column  to  be  used  as  the  next  p 
vector.  The  entire  Viterbi  process  is  then  repeated  when  new  FFT  data  is 
made  available. 

4.2  Mean  Cell  Occupancy  Processing  (MCO) 

4.2.1  Calculation  of  alpha  (forward  processing) 

MCO  processing  is  analogous  to  Viterbi  processing.  The  algorithm  for  for¬ 
ward  processing  is: 


Forward  probabilities  :  z[t],  measurement  at  timet 

t  =  0:  <*[0,j]  =  it\j}*  &[.?, z[0]]/s[0],  ;  =  0,...,n  (7) 

t  =  1...T-  1  : 

n 

a[t,j]  =  b[j,z[t]\  *  £(a[t  -  l,i]  *  a[t,  ;])/*[<],  j  =  0,.  ..,n.  (8) 

«=o 

The  quantity  s[t]  in  (7)  and  (8)  is  a  scale  factor  required  to  control 
numerical  underflow  in  the  a’s. 

For  time  =  0:  An  a  vector  at  time  =  0  is  initially  calculated  by  mul¬ 
tiplying  the  it  vector  with  a  column  of  the  B  matrix  as  identified  by  the 
measurement  z[0].  A  scale  factor,  s(0],  using  this  a  is  evaluated  and  used  to 


-281- 


Connection  Machine  Implementation  of  Hidden  Markov  Models  211 


scale  the  a  just  obtained.  This  same  scale  factor  is  later  used  in  the  0  (back¬ 
ward)  processing.  The  resulting  a  is  then  replicated  within  the  segment  and 
subsequently  sent  down  to  the  field  prev.alpha  in  the  next  time  frame  (i.e., 
more  recent  time). 

For  time  >  0:  The  prev.alpha  vector  received  is  multiplied  by  the  A 
matrix  producing  (nstate+l)  column  vectors.  A  sum  along  the  columns  is 
performed  and  the  resulting  row  vector  is  transposed  into  a  column  vector 
that  is  then  replicated  everywhere  in  the  segment.  This  is  then  multiplied  by 
a  column  from  the  B  matrix  as  identified  by  the  measurement  z[t\  producing 
a[t,j].  A  scale  factor,  s[t],  is  calculated  and  used  to  scale  the  a[t,j)  just 
obtained  and  saved  for  later  use  by  the  (3  processing.  The  resulting  aft,.;] 
is  then  replicated  along  the  x-axis  and  subsequently  sent  down  to  the  next 
time  frame  with  the  process  repeated  for  T  -  1  time  steps. 

4.2.2  Calculation  of  beta  (backward  processing) 

The  beta,  or  backward  processing,  follows  the  alpha  processing.  The  algo¬ 
rithm  is: 


Backward  probabilities  :  z[t],  measurement  at  timet 
t  —  T  -  l  :  0[T-lJ]=l/s[T-l]  (9) 

t  =  T-  2,...,0 

0[t,j]  =  J)(a[7,i]*6[i,2:[t-M]]*/3[t-l-  1,  *])/«(<],  j  =  0,...,n.  (10) 

t=a 

As  before,  s[t]  in  (9)  and  (10)  is  a  scale  factor  required  to  control  numer¬ 
ical  underflow.  The  same  scale  factor  previously  calculated  for  a[t,  j]  is  used 
for  the  0's. 

Beta  processing  goes  backward  in  time  from  (T  -  1)  to  zero.  (3  for  time 
=  (T  —  1)  is  set  to  1.0  scaled  by  the  scale  factor  obtained  at  T  —  1.  Once 
initialized  in  this  manner  it  is  possible  to  calculate  (3  for  the  previous  time 
step  from  the  current  time  step  by  rewriting  (10)  as: 

0[t  -  1  ,j\  =  i]*6[i,z[t]]*/3[t,t].  (11) 

i=0 

The  beta  processing  therefore  proceeds  by  selecting  the  appropriate  col¬ 
umn  from  the  B  matrix  using  z[t\  as  the  index  and  replicating  that  column 
everywhere  within  the  segment.  This  is  then  multiplied  by  the  0  for  the 
current  time  step  (which  has  been  previously  scaled).  This  result  is  then 
multiplied  by  the  AT  matrix  producing  (nstate+1)  column  vectors.  A  sum¬ 
mation  along  each  column  is  then  performed  and  the  resulting  row  vector 
is  subsequently  transposed  to  form  a  column  vector.  The  resulting  vector  is 


212 


Jose  L.  Munoz  and  Roy  L.  Streit 


then  replicated  everywhere  within  the  gate  along  the  x-axis  and  is  then  sent 
up  as  the  (3  for  the  previous  time  step.  Once  it  is  received  by  the  previous 
time  step  it  is  scaled  using  the  scale  factor  for  that  time  step.  The  process 
is  then  repeated  T  -  1  times. 

4.2.3  Gamma  and  MCO  processing 

Following  the  calculation  of  the  a  and  the  /3  the  next  step  is  to  calculate 
the  state  occupancy  probabilities,  7’s,  and  the  Mean  Cell  Occupancy.  The 
algorithm  is: 


State  occupancy  probabilities  (track  quality) : 

n 

7 [<»  *]  =  (<*[*,  *]  *  /*(*>  *])/  £(<*[*,  *]  *  f3[t,  t])  (12) 

*=1 

Mean  Cell  Occupancy  (M)  : 

M[t]  =  JZ(7[*,  *]  *  fc[i])/ £  jit,  *],  (13) 

«=i  «=i 

where  fc[i]  =  (/[»]  +  /[*  +  l])/2 

<^aW  =  -  A/[<]]2/(l  -  tM]).  (14) 

1=1 

The  7  calculations  are  executed  in  parallel  for  all  time  steps. 

The  a  and  the  (3  vectors  are  multiplied  and  the  resulting  vector  stored 
temporarily.  The  elements  of  this  resulting  vector  are  summed  and  subse¬ 
quently  applied  to  the  stored  vector  via  a  division  operation  resulting  in  a 
7[t,z]  for  each  time  step.  This  value  is  then  copied  as  a  column  vector  into 
gamma-pi  for  subsequent  use  as  the  n  vector  for  the  next  batch.  The  7  col¬ 
umn  vector  is  then  transformed  into  a  row  vector  via  a  transposition.  The 
resulting  row  vector  is  then  multiplied  by  a  row  vector  representing  the  center 
frequency  of  the  zth  cell,  /c,  that  was  previously  calculated  and  stored.  Ele¬ 
ments  1. .  .nstate  of  the  resultant  row  vector  are  added  and  the  resultant  is 
divided  by  (1.0- 7ft,  0]).  The  result  is  the  mean  cell  occupancy,  MCO ,  a  sin¬ 
gle  variable  that  is  then  replicated  everywhere  in  the  segment.  ( fc  -  MCO)2 
is  then  calculated  and  subsequently  multiplied  by  7 [t,  t]  with  the  resultant 
row  vector  elements  1... nstate  summed  and  divided  by  (1.0  -  7[t,0])  ter¬ 
minating  in  a  square  root  operation  resulting  in  the  standard  deviation  <r, 
completing  the  7  calculations. 

4.3  Mean  Cell  Occupancy  Display 

The  Jr  vector  for  the  next  batch  is  obtained  by  selecting  the  second  to  oldest 
time  slot  (time  =  1)  and  copying  over  its  gamma.pi  field  into  the  p  vector. 


-283- 


Connection  Machine  Implementation  of  Hidden  Markov  Models  213 


Next,  the  time  slot  at  which  data  is  to  be  displayed  is  selected,  T/2. 
The  a  value  obtained  is  replicated  everywhere  in  the  segment  and  is  then 
subtracted  from  MCO.  The  resulting  value  is  compared  to  the  x-coordinate 
with  points  greater  than  or  equal  to  the  x-coordinate  set  to  a  color  particular 
to  that  gate,  a  is  then  added  to  MCO  and  that  value  is  compared  to  x- 
coordinate,  with  points  less  than  or  equal  to  the  result  set  to  the  gate  color. 
The  result  is  a  series  of  points  set  to  a  color  if  they  are  within  a  of  MCO. 
This  data  is  then  sent  to  the  framebuffer  for  display. 

4.4  Probability  Field  Display 

Equation  (12)  results  in  the  7  vector.  As  defined,  each  element  of  7  is 
bounded  by  [0. . .  1]  and  represents  for  each  HMM  state  t,  the  likelihood  that 
the  signal  is  in  that  particular  state.  7  can  therefore  be  used  as  an  estimate  of 
track  quality.  The  track  quality  is  displayed  for  the  non-zero  states  in  grey¬ 
scale  at  each  gate  location  with  a  greater  confidence  in  the  track  estimate 
appearing  as  brighter  areas  on  the  display. 

Using  this  information  an  analyst  can  more  intelligently  interpret  the 
track  trajectories  displayed.  Initial  experimentation  with  the  probability 
field  has  been  found  to  be  very  effective  as  an  image  enhancement  device  in 
the  context  of  presenting  track  trajectory  information  and  shall  be  further 
explored. 


5  Performance 

HMM  was  first  implemented  on  a  Sun  4/260  to  act  as  a  baseline.  FFT 
size,  number  of  HMM  states  and  batch  sizes  were  selected  that  were  felt 
to  be  representative  for  problems  of  interest  resulting  in  CM  VP  ratios  of 
16,  with  25  gates.  The  HMM  algorithms  described  above  were  implemented 
on  a  CM-2a  with  8192  processors  and  32-bit  hardware  floating  point,  Sun 
4/260  front-end,  executing  Version  5.2  of  the  CM  software.  The  CM  code 
was  instrumented  to  obtain  performance  information.  A  detailed  geometry 
was  defined  by  analyzing  the  data  movement  patterns.  Little  data  is  moved 
along  the  y-axis  and  what  is  moved  uses  strictly  NEWS  communication  that 
happens  only  once  in  a  cycle;  along  the  z-axis  there  is  a  requirement  for  data 
replication  for  various  operations;  the  majority  of  the  data  movement  occurs 
along  the  x-axis  with  requirements  for  data  replication  and  orientation.  As  a 
result,  the  CM’s  defined  detailed  geometry  instruction  was  used  with  weights 
of  10, 1  and  8  for  the  x,  y  and  z  axes,  respectively  (only  their  relative  ordering 
is  significant,  the  actual  values  are  not  important).  Table  1  provides  timing 
information  obtained  from  the  Sun4  execution  and  Table  2  CM  execution 
timings  for  both  default  geometry  and  the  detailed  geometry. 


-284 


214 


Jose  L.  Munoz  and  Roy  L.  Streit 


Sun  4/260  Single  25 

Times(sec)  Gate  Gates 

Viterbi  (T03  0J5 

MCO  0.05  1.25 

Table  1:  Processing  time  on  Sun4  (display  and  signal  generation  times  not 
included). 


CM  (VP=16)  Baseline  Detailed  Improvement 

Times  (sec) _ Geometry _ Geometry _ 

Viterbi  1.42  1.28  10% 

MCO  1.94  1.59  18% 

Table  2:  Processing  times  on  CM.  25  gates  (display  and  signal  generation 
times  not  included). 


Clearly,  for  a  25  gate  problem  sequential  execution  on  a  Sun4  outperforms 
the  CM  implementation.  It  would  appear  that  the  CM  can  outperform 
the  sequential  implementation  only  by  virtue  of  executing  multiple  gates  in 
parallel,  or  via  the  MIMD  model.  This  would  indicate  that: 

1.  The  particular  implementation  described  in  this  paper  was  not  able  to 
take  full  advantage  of  any  inherent  data  parallelism,  or 

2.  There  is  insufficient  data  parallelism  in  the  application. 

The  “break  even”  point  between  the  sequential  and  parallel  implementations 
is  43  gates  for  Viterbi  and  32  gates  for  MCO. 

5.1  Enhancements 

5.1.1  Multi-line  HMM 

The  CM  implementation  provided  the  opportunity  to  investigate  enhancing 
HMM.  As  previously  discussed,  HMM  requires  that  at  most  only  one  signal 
be  present  in  a  gate.  A  slight  modification  to  the  CM  implementation  pro¬ 
vided  a  suboptimal  approach  supporting  more  than  one  signal  in  a  gate.  The 
modification  consisted  of  staging  the  data  so  that  the  first  stage  has  access 
to  the  complete  frequency  magnitude  information  as  described  in  previous 
sections.  Once  a  measurement  determination  has  been  executed  by  this  first 
stage,  the  data  at  that  frequency  bin  is  removed,  either  by  making  it  zero  or 
replacing  it  with  noise,  and  the  altered  data  is  then  passed,  via  spread  along 
the  z-axis,  to  a  second  stage  for  processing. 

Second  stage  processing  is  an  exact  copy  of  the  first  stage  processing,  the 
only  difference  is  that  the  second  stage  does  not  have  access  to  the  complete 
data  set.  The  second  stage  processing  then  proceeds  to  make  a  measurement 
using  the  altered  data;  it  therefore  has  the  opportunity  to  find  a  second  line 


-285- 


Connection  Machine  Implementation  of  Hidden  Markov  Models  215 


in  a  gate  if  it  exists.  Viterbi  and  MCO  processing  then  proceeds  as  before, 
with  each  stage  using  the  measurements  pertaining  to  it,  and  the  stages 
running  in  parallel. 

The  performance  of  multi-line  HMM  depends  on  (1)  the  frequency  sta¬ 
bility  of  the  lines  within  a  gate,  and  (2)  the  SNR  difference  between  the 
lines.  Preliminary  results  obtained  have  found  such  an  implementation  to 
provide  adequate  capabilities  when  the  SNR  of  the  two  lines  are  sufficiently 
different.  An  optimal  HMM  approach  to  multi-line  tracking  is  presented  in 

[4]- 


5.1.2  “Mostly  Forward”  HMM 

The  dynamic  programming  requirements  of  HMM  result  in  an  implementa¬ 
tion  that  is  0(T )  in  computational  performance,  i.e.,  a  problem  of  size  2 T 
will  take  twice  as  long  as  a  problem  of  size  T ,  all  other  parameters  being 
equal.  Therefore,  the  sequential  aspects  of  the  problem  are  driving  its  per¬ 
formance  since  the  data  for  (t  +  1)  or  (t  -  1)  must  be  available  before  one 
can  proceed  with  the  forward  or  backward  process. 

In  an  attempt  to  moderate  this  behavior,  the  Viterbi  algorithm  was 
slightly  modified.  The  initial  batch,  i.e.,  T  time  samples,  was  treated  as 
previously  described.  However,  subsequent  batches  did  not  update  the  n 
vector.  In  this  manner  it  was  not  necessary  to  recalculate  the  T  -  1  previous 
*[t]’s.  It  is  only  therefore  required  to  calculate  V[T  -  1]  using  z[T  -  1]  and 
6[T  -  2],  going  “mostly  forward.”  The  backward  Viterbi  process  remained 
unchanged. 

This  modification  improved  run  time  performance  by  over  50%,  resulting 
in  an  execution  time  of  0.55  sec  for  the  same  size  problem  identified  in  Table 
1.  Preliminary  results  with  this  modification  provided  reasonable  results, 
suggesting  that  updating  the  7 r  vector  may  not  be  required  (for  comparable 
tracking  accuracy,  the  batch  size  T  must  be  increased  slightly).  Clearly,  a 
similar  modification  could  be  made  to  the  MCO  tracker,  with  a  comparable 
improvement  in  its  run  time  performance,  although  such  a  modification  has 
not  been  implemented  at  this  time. 

6  Conclusions 

Clearly  it  is  possible  to  implement  a  dynamic  programming  model,  such  as 
HMM,  on  the  CM.  Its  performance  will  be  strongly  affected  by  the  number 
of  dependent  steps  required  to  solve  the  problem.  While  some  improve¬ 
ments  were  identified  that  alleviated  this  “problem,”  such  as  the  “mostly 
forward”  approach,  these  improvements  would  be  just  as  applicable  to  a 
purely  sequential  implementation  such  as  that  implemented  on  the  Sun  with 
an  attendant  improved  run  time  performance. 


-286- 


216 


Jose  L.  Munoz  and  Roy  L.  Streit 


While  previous  CM  efforts  at  NUSC  have  achieved  dramatic  improve¬ 
ments  by  replicating  data,  it  was  not  possible  to  effect  a  corresponding  ben¬ 
efit  with  the  straightforward  intuitive  implementation  as  described  in  this 
paper.  Data  replication  did  not  perform  as  well  as  initially  expected  due 
to  (1)  no  convenient  method  for  replicating  data  within  a  segment  other 
than  combining  a  scan.with.oper  followed  by  a  scan.with.copy  (where  oper 
represents  some  arithmetic  operation),  and  (2)  the  data  reference  patterns, 
as  implemented,  required  several  vector  transpose  operations  requiring  re¬ 
mote  sends.  Future  efforts  will  focus  on  perhaps  a  different  topology  that 
would  keep  information  in  in-processor  arrays  and  thereby  avoid  some  of 
the  NEWS  communication  (beneficial  only  if  aref  costs  are  less  than  NEWS 
communication);  or  an  implementation  avoiding  vector  transpositions.  Nei¬ 
ther  of  these  approaches,  however,  is  truly  exploiting  the  data  parallelism 
model  as  exemplified  by  the  CM.  What  is  required  is  an  approach  that  con¬ 
siders  the  input  measurement  sequence  space  as  a  unit  and  treats  the  output 
probability  field  accordingly.  This  is  indeed  a  topic  for  future  work. 

There  also  exists  a  need  to  enhance  the  HMM  approach  via  (1)  devel¬ 
opment  of  an  HMM  capable  of  handling  more  than  a  single  line  in  a  gate 
(the  multi-line  approach  presented  above  is  a  viable  “workaround”  in  the 
interim),  (2)  development  of  an  HMM  capable  of  handling  lines  that  cross, 
(3)  development  of  an  HMM  such  that  track  termination  would  be  sensitive 
to  “direction,”  i.e.,  leaving  the  gate  on  the  left  or  right,  thereby  enabling 
initiation  of  a  new  track  at  the  adjacent  gate,  or  a  true  loss  of  signal,  (4)  an 
implementation  with  adaptive  gates,  i.e.,  gates  that  are  not  assigned  a  priori 
but  are  defined  where  a  measurement  is  made,  in  situ,  with  the  measurement 
placed  in  the  center  of  the  gate,  and  finally  (5)  implementation  of  HMM  that 
can  take  advantage  of  the  amplitude  and  phase  information  inherent  in  the 
data  [3]. 

In  addition,  the  CM  implementation  provides  the  opportunity  to  run 
multiple  SNR  hypotheses  in  parallel  (recall  that  the  B  matrix  is  sensitive  to 
SNR).  This  could  be  accomplished  by  running  each  of  the  possible  candi¬ 
date  SNRs  along  different  dimensions  of  a  CM  geometry.  The  data  fusion 
aspect  of  such  a  multiple  model  approach  is  an  additional  area  of  future 
investigation. 

Experience  garnered  from  this  and  related  CM  efforts  at  NUSC  has  re¬ 
sulted  in  the  identification  of  CM  capabilities  for  possible  future  considera¬ 
tion:  (1)  VP  ratios  at  other  than  powers  of  2,  since  a  small  change  in  the 
size  of  a  problem  can  have  significant  effects  on  its  performance,  (2)  the  need 
for  higher-order  languages  to  support  both  implicit  and  explicit  communi¬ 
cation,  (3)  specification  of  other  than  cartesian  geometries,  e.g.,  spherical 
or  cylindrical,  that  could  efficiently  utilize  the  processing  and  communica¬ 
tion  resources,  (4)  improved  debugging  aids,  (5)  addition  of  performance 
monitoring,  code  instrumentation  (profiling),  (6)  efficient  implementation 


-287- 


x 


Connection  Machine  Implementation  of  Hidden  Markov  Models  217 


of  a  “segmented  spread,”  i.e.,  from  an  identified  coordinate  copy  a  datum 
everywhere  within  a  segment  ( spread  works  along  an  axis),  and  finally  (7) 
application  specific  subroutine  packages  such  as  image  processing,  signal 
processing,  etc. 


Acknowledgements 

The  authors  would  like  to  especially  recognize  Mr.  R.  Bernecky  for  his 
significant  contributions  to  this  effort.  His  insight  into  the  problem  domain, 
as  a  result  of  similar  work,  and  the  many  hours  of  discussions  provided  are 
appreciated.  The  authors  would  also  like  to  thank  Mr.  R.  Kneipfer  for  his 
suggestions  on  this  manuscript. 

References 

[1]  R.  L.  Streit  and  R.  F.  Barrett,  “Frequency  Line  Tracking  Using  Hid¬ 
den  Markov  Models,”  IEEE  Trans,  on  Acoustics,  Speech,  and  Signal 
Processing,  Vol.  38,  No.  4,  pp.  586-598,  April  1990. 

[2]  A.  V.  Aho,  J.  E.  Hopcroft,  J.  D.  Ullman,  “The  Design  and  Analysis 
of  Computer  Algorithms,”  Addison- Wesley  Publishing  Company ,  pp. 
67-69,  1974. 

[3]  R.  F.  Barrett  and  R.  L.  Streit,  “Frequency  Line  Tracking  Using  Hidden 
Markov  Models  with  Phase  Information,”  Proceedings  of  the  Second 
International  Symposium  on  Signal  Processing  and  Its  Applications, 
27-31  August  1990,  Gold  Coast,  Australia. 

[4]  X.  Xianya  and  R.  J.  Evans,  “Multiple  Frequency  Line  Tracking  Using 
Hidden  Markov/A  Models,”  Technical  Report  EE9007,  February  1990, 
University  of  New  Castle,  New  South  Wales,  Australia. 


ARTIFICIAL  NEURAL  NETWORK  STUDIES 


Foreword 


Artificial  neural  networks  (NNs)  are  specialized  computer  architectures  for 
classifying  multivariate  feature  sets.  Paper  [22]  establishes  that  feed-forward  NN 
architectures  can  implement  asymtotically  optimum  classifiers,  when  properly  trained,  and 
that  maximum  likelihood  estimation  methods  can  be  used  as  NN  training  algorithms.  The 
statistical  approach  to  NN  classification  and  training  is  pursued  much  further  in  paper  [23]. 
In  this  paper,  particular  attention  is  given  to  training  algorithms  suited  to  small  data 
problems,  that  is,  to  problems  in  which  only  a  small  amount  of  training  data  for  one  or 
more  classes  is  available.  The  approach  enables  general  nonlinear  discrimination,  and  it 
generalizes  Fisher's  classical  method  for  linear  discrimination.  A  maximum  likelihood  NN 
training  algorithm  is  derived,  using  a  variant  of  the  Expectation-Maximization  algorithm. 
Paper  [241  shows  how  to  use  the  output  of  the  maximum  likelihood  training  phase  to 
develop  maximum  entropy  estimates  for  die  class  a  priori  probabilities. 


-289- 


A  Neural  Network  For  Optimum 
Neyman-Pearson  Classification 

R.  L.  Streit 


-291- 


A  NEURAL  NETWORK  FOR  OPTIMUM  NEYMAN-PEARSON  CLASSIFICATION 


Roy  L.  Strelt 

Naval  Underwater  Systems  Center 
New  London  Laboratory 
New  London,  CT  06320 


ABSTRACT 

A  three-layer  feed-forward  neural  network  (NN)  that  implements  the 
optimum  Neyman-Pearson  (N-P)  classifier  is  described.  This  NN  is  useful 
whenever  it  is  appropriate  to  characterize  (1)  input  classes  as  multivariate 
random  variables,  and  (2)  input  data  vectors  as  realizations  of  one  of  the 
multivariate  random  variables.  The  purpose  of  the  NN  is  thus  simply  to 
compute  the  conditional  likelihoods  necessary  for  the  N-P  classifier. 
Because  the  N-P  classifier  is  optimal,  the  classification  performance  of  the 
NN  is  optimal  too.  Therefore,  three-layer  feed-forward  NN  classifiers  can 
equal  but  not  exceed  the  performance  of  the  well  known  N-P  classifier. 

The  optimal  N-P  classifier  requires  multivariate  probability  density 
functions  (PDFs)  characterizing  the  input  classes.  Class  PDFs  are 
approximated  (arbitrarily  closely)  by  mixtures  of  multivariate  Gaussian  PDFs. 
Supervised  training  of  the  class  PDFs  from  input  data  vectors  is,  thus, 
equivalent  to  training  the  NN.  Maximum  likelihood  training  of  the  PDFs  is 
performed  by  the  EM  algorithm  (or  by  any  other  suitable  optimization  method). 


1.  INTRODUCTION 

The  discussion  of  NNs  in  this  paper  is  limited  to  the  fundamental 
problem  of  classification,  or  recognition,  of  a  given  input  vector  as  one  of 
several  possible  outcome  pattern  classes.  For  the  purposes  of  this  paper, 
then,  a  NN  is  a  specialized  device  for  pattern  recognition.  The  discussion  is 
also  limited  to  conventional  feed-forward  three-layer  NNs.  A  node  is  a 
computational  unit  that  forms  a  weighted  sum  of  all  its  inputs  and  then 
passes  the  result  through  a  nonlinearity.  The  nonlinearity  is  characterized 
by  a  real-valued  nonlinear  function  h,  together  with  a  threshold  x.  Thus  the 
output  of  a  node  is  the  numerical  value  of  the  function  h  when  its  argument 
is  the  weighted  summation  minus  the  threshold  x.  Different  nonlinear 
functions  h  have  been  proposed;  typically,  they  are  monotone  non-decreasing 
functions,  and  are  asymptotically  constant  for  large  arguments  (positive  and 
negative).  The  weights  used  to  form  the  weighted  sum  within  a  node  are 
called  the  Interconnection  weights.  They  are  uniquely  defined  for  each 
interconnection,  that  is,  between  each  communication  link  between  node  and 
node  and  between  node  and  input.  The  weights  can  be  positive,  negative,  or 
zero.  The  problem  of  assigning  appropriate  interconnection  weights  and  nodal 
thresholds  is  known  as  the  training  problem. 

The  maximum  likelihood  methodology  presented  in  this  paper  provides  a 
unified  theoretical  viewpoint  for  understanding  NNs  and  for  developing 
computationally  effective  training  algorithms.  The  methodology  is  rooted  in 
a  classical  statistical  and  mathematical  framework  that  is  esthetlcally 


I  -  685 


-293- 


satisfying  and  potentially  very  significant  for  the  future  development  of  NN 
architectures.  One  significant  advantage  of  this  methodology  is  that  the 
important  "exclusive-or"  classification  problem  is  easily  solved. 

N-P  classification  requires  knowledge  of  the  PDF  of  the  input  data 
vector  X,  conditioned  on  each  of  the  various  class  membership  hypotheses. 
For  M  outcome  classes,  N-P  classif ication  requires  M  conditional  PDFs. 
Let  g  (X)  denote  the  conditional  PDF  of  the  J-th  class,  and  let  a.  denote  the 
a  priori  probability  of  class  J.  Then  the  N-P  classification  estimate  for 
the  input  vector  X  is  the  class  J*  for  which 

g.,(X)  =  max  a.  g.(X).  (1) 

J  lsjsM  J  J 

Since  (a  }  are  typically  unknown,  they  are  often  treated  as  free  parameters 
and  are  used  to  adjust  the  probability  of  incorrect  classification.  N-P 
classification  is  optimum  in  the  sense  that,  for  a  given  probability  of 
incorrect  classification,  the  probability  of  correct  classification  is 
maximized.  Hypothesis  testing  is  described  in  many  places,  e.g. ,  [1]. 

The  NN  presented  in  this  paper  requires  that  the  output  layer  have  M 
nodes,  and  that  each  output  node  evaluate  one  of  the  outcome  class 
conditional  PDFs.  Estimation  of  the  conditional  PDFs  is  therefore  a  central 
issue.  In  this  paper,  conditional  PDFs  are  approximated  by  mixtures  of 
multivariate  Gaussian  PDFs.  (It  is  well  known  that  very  general  PDFs  can  be 
approximated  to  arbitrary  accuracy  by  Gaussian  mixtures. )  Both  the  training 
algorithm  and  the  NN  structure  are  affected  by  this  approximation.  The  PDFs 
of  the  Gaussian  components  in  the  mixtures  are  computed  by  the  first 
two,  or  hidden,  layers. 

Training  the  NN  presented  in  this  paper  is  undertaken  on  the  training 
data  to  obtain  estimates  of  the  outcome  class  mixture  PDFs.  Knowledge  of  the 
class  PDFs  enables  the  design  of  a  NN  architecture  with  a  performance  that 
can  be  readily  tested  without  building  the  NN.  The  supervised  training 
algorithm  is  an  established  maximum  likelihood  algorithm  for  estimating 
mixture  PDFs  known  as  the  EM  (Estimation  and  Maximization)  algorithm  [2]. 

The  NN  described  in  [3]  is  in  some  ways  similar  to  the  NN  described  in 
this  paper.  For  instance,  both  approaches  model  the  output  classes 
statistically  using  mixtures  of  multivariate  Gaussian  PDFs.  However,  they 
also  differ  in  significant  ways.  In  [3],  the  number  of  first  layer  nodes 
equals  the  number  of  training  vectors,  and  the  interconnection  weights 
between  the  inputs  and  the  first  layer  nodes  are  proportional  to  the 
components  of  the  training  vectors.  In  this  paper,  however,  the  number  of 
first  layer  nodes  is  independent  of  the  size  of  the  training  set,  and  the 
interconnection  weights  are  nontrivial  functions  of  the  training  vectors. 


2.  NEYMAN-PEARSON  NEURAL  NETWORK  DESCRIPTION 
2. 1  The  Output  Laver 

In  the  context  of  this  paper,  an  output  node  is  conceptually  identified 
with  an  outcome  class  hypothesis.  The  defined  purpose  of  an  output  node  is 
thus  to  compute  a  numerical  value  equal  to  the  likelihood  of  the  input  vector 
X,  conditioned  on  an  outcome  class  hypothesis  characterized  by  a  multivariate 


I  -  686 


-294- 


mixture  PDF.  The  NN  performs  N-P  classification  by  selecting  the  outcome 
class  of  the  output  node  having  the  largest  numerical  value. 

It  is  assumed  in  this  section  that  the  multivariate  PDFs  for  every 
component  population  in  the  various  class  mixture  PDFs  are  computed  by  the 
second  hidden  layer.  The  design  of  the  hidden  layers  to  accomplish  the 
required  PDF  evaluations  is  described  for  Gaussian  components  in  Section  2.2. 


Let  G  denote  the  total  number  of  different  components  in  the  outcome 
class  mixture  PDFs,  that  is,  each  of  the  M  possible  outcome  classes  is 
comprised  of  some  combination  of  the  G  population  components.  The  number  of 
required  second  hidden  layer  nodes  is  therefore  G.  Let  p  (X)  denote  the 
multivariate  probability  density  function  of  the  i-th  mixture  component,  and 
let  ir  denote  the  proportion  of  population  i  in  outcome  class  j.  The 

numbers  n  are  non-negative  and  satisfy  the  equations 
J  G 

7  n  =  1,  J  -  1,  ....  M.  (2) 

i=l  1J 


It  follows  that  the  output  of  the  J-th  output  layer  node  is  given  by 
G 

gJ(X)  =  I  *ij  Pi(X)-  J  ■  1 . M. 


i=l 


(3) 


The  output  from  the  second  hidden  layer,  given  the  input  vector  X,  is  the  set 
of  numbers  (p  (X)}.  Consequently,  to  enable  the  NN  to  compute  the  M  class 
PDFs  {g  }  in  ^(3),  the  interconnection  weight  between  the  i-th  node  in  the 
second  ^hidden  layer  and  the  J-th  output  node  must  be  the  mixing  proportion 
itjj.  Estimating  the  class  PDFs  thus  sets  these  NN  weights  directly. 


Because  the  NN  is  Intended  to  be  the  optimum  N-P  classifier,  the 
nonlinear  function  of  an  output  node  is  given  by  the  linear  function 

hg(x)  =  a  x,  (4) 

for  all  real  values  x,  where  «  is  the  a  priori  probability  of  the  output 
class.  The  output  node  thresholds  are  zero  for  N-P  classification. 


2.  2  Gaussian  Component  Architecture  For  The  Hidden  Lavers 

The  purpose  of  a  node  in  the  second  hidden  layer  is  to  compute  and  pass 
to  the  output  layer  a  numerical  value  equal  to  the  likelihood  of  the  input 
vector  X  (of  length  N)  under  the  hypothesis  that  it  is  a  realization  of  a 
component  population  in  one  of  the  mixture  outcome  classes.  Consequently,  in 
the  case  that  the  component  population  is  a  multivariate  Gaussian  with  mean 
vector  p  and  positive  definite  covariance  matrix  2,  the  output  p(X)  of  a 
second  layer  node  has  the  mathematical  form 

p(X)  =  £  (2tON  |2|  J  expj-  i  (X  -  p)T  Z_1  (X  -  p)  J-,  (5) 

where  |Z|  denotes  the  determinant  of  the  matrix  Z.  The  mean  vector  p  and 
covariance  matrix  Z  are  estimated  during  the  NN  training  phase.  One  way  to 
compute  p(X)  within  the  confines  of  the  hidden  layers  of  a  conventional  NN 
architecture  is  described  in  this  section. 

The  covariance  matrix  Z  of  the  Gaussian  distribution  (5)  is  positive 
definite,  so  it  has  the  Cholesky  factorization  Z  =  L  L' ,  where  the  matrix  L 
is  lower  triangular  (i.e,,  the  entries  of  L  above  the  diagonal  are  zero)  and 


I  -  687 


-295- 


has  positive  diagonal  entries,  and  where  '  denotes  the  matrix 
transpose.  Substituting  into  (5)  and  simplifying  gives 


p(X)  =  [  (2w)N  |Z| 
where  II  •  II  is  the  usual  Euclidean  norm  on  R  . 

The  second  layer  nodes  are  required  to  have  no  inputs  in  common.  This 
requirement  probably  increases  the  number  of  first  layer  nodes  in  the  overall 
NN,  but  it  does  not  fundamentally  alter  the  NN  structure.  That  is,  a 
fully-connected  NN  in  which  certain  of  the  interconnection  weights  between 
the  hidden  layer  nodes  are  set  to  zero  satisfies  the  requirement. 


]  exp{-i|| 


L_1X  -  L’V  II  2 


(6) 


The  hidden  layer  architecture  from  the  input  layer  to  a  second  layer 
node  is  characterized  by  the  following  quantities: 

1.  Nodal  nonlinear  functions  in  both  layers 

2.  Nodal  thresholds  in  both  layers 

3.  Number  of  first  layer  nodes 

4.  Interconnection  weights  between  the  input  and  first  layer 

5.  Interconnection  weights  between  first  and  second  layers. 

In  the  following  paragraphs,  these  quantities  are  explicitly  defined  in  terms 
of  the  mean  p  and  covariance  Z  of  the  Gaussian  PDF. 


Implementation  of  expression  (6)  within  the  hidden  layers 
a  different  nodal  nonlinearity  in  each  layer.  The  nonlinear 
required  for  a  first  hidden  layer  node  is  given  by 

hjlx)  =  |x|2. 


requires 

function 

(7) 


The  first  layer  nonlinear  function  is  thus  especially  simple  to  implement. 
The  nonlinear  function  required  for  a  second  hidden  layer  node  is  given  by 

H 

These  nodal  nonlinear  functions  are  different  from  those  typically  utilized 
in  NN  applications. 


h2(x) 


=  [  (2ir)N  |I|  j 


-1/2 


exp 


Expression  (6)  yields  a  mathematical  expression  for  the  thresholds  of 
the  first  hidden  layer  nodal  nonlinearities  in  terms  of  the  trained 
parameters  p  and  Z  characterizing  the  Gaussian  component.  Specifically,  the 
threshold  r.  for  the  i-th  first  layer  node  is  given  by 


Thresholds  for  the  second  layer  nodes  are  identically  zero. 


The  expression  (6)  implies  that  the  number  of  first  layer  nodes 
connected  to  a  given  second  layer  node  is  equal  to  N,  the  dimension  of  the 
input  vector  X.  Because  of  the  requirement  that  the  G  second  layer  nodes 
have  no  input  first  layer  nodes  in  common,  the  total  number  of  first  layer 
nodes  in  the  overall  NN  is  equal  to  the  product  N*G. 


The  expression  (6)  also  implies  that  the  interconnection  weights  between 
the  components  of  the  input  vector  X  and  the  nodes  of  the  first  hidden  layer 
be  proportional  to  the  components  of  the  Inverse  of  the  Cholesky  factor  L. 
This  follows  from  the  nature  of  the  argument  of  the  norm  II  •  II  in  (6).  The 


I  -  688 


-296- 


interconnection  weight  between  i-th  input  and  the  J-th  first  layer  node  is 
therefore  the  (l,j)  component ^of  the  inverse  of  L.  About  half  of  these 
weights  are  zero  because  L  is  upper  triangular.  The  interconnection 
weights  between  the  two  hidden  layers  are  all  unity,  because  the  sum 
(Implicit  in  the  squared  norm  in  (6))  is  unweighted. 


3.  TRAINING  THE  NEURAL  NETWORK 

Supervised  training  in  the  context  of  this  paper  means  that  the  correct 
outcome  class  of  each  data  vector  in  the  training  set  is  known  beforehand. 
Thus,  an  outcome  class  is  known  only  from  a  collection  of  data  vectors,  and 
training  proceeds  by  using  these  vectors  to  estimate  the  defining  parameters 
of  the  outcome  class  mixture  PDF.  The  algorithm  proposed  in  this  paper  is 
the  EM  algorithm  that  was  originally  described  in  [2].  The  EM  algorithm  is 
applicable  to  very  general  mixture  PDF  estimation  problems,  but  it  takes  on 
an  especially  simple  form  when  applied  to  multivariate  Gaussian  PDFs. 
Details  of  the  EM  algorithm  for  Gaussian  and  non-Gausslan  PDFs  are  given  in 
[2]  and  [4]. 

G  denotes  the  maximum  number  of  different  component  populations  in  the 
mixture  PDF  of  a  particular  outcome  class.  If  G  is  initially  unknown,  it 
must  be  selected  in  some  fashion  arid  adapted  as  necessary.  The  number  G  is 
not  estimated  by  the  EM  algorithm.  After  training  is  completed,  however,  G 
can  be  changed  as  desired,  and  training  restarted. 

Let  X  denote  the  set  of  data  vectors  in  the  training  set  corresponding 
to  a  given  outcome  class,  and  let  Til  denote  the  number  of  vectors  in  the 
set  X.  It  is  assumed  that  the  vectors  in  X  are  realizations  of  independent 
identically  distributed  trials  of  the  outcome  class.  Let  x^  denote  the  mixing 
proportion  of  the  i-th  component  population,  and  let  the  l-th  multivariate 
Gaussian  component  PDF  have  mean  vector  p.  and  covariance  matrix  E^.  The 
parameter  vector  characterizing  the  mixture  PDF  is  A  *  (  (x  ,  p  ,  £  )  ,  i 

=  1,  ....  G  >.  The  likelihood  function  !?(X|A)  is  defined  by  the  product 

T 

«?(XU)  =  n  g(X  | A) ,  (10) 

t=1  N 

where  the  mixture  PDF  g(X|A),  for  any  X  e  R  ,  is  given  by  (cf. ,  (3)  and  (5)) 

G  —1/2 

g(X|A)  =  £  xt  ^(2x)N|£ilJ  exp|-  |  (X  -  p^1  E"1  (X-p^J  (11) 

It  is  clear  that  computing  a  (global)  maximum  likelihood  estimate  for  A  is  a 
highly  nonlinear  problem.  In  addition,  the  likelihood  function  !?(XIA)  often 
has  distinct  local  maxima.  In  practice,  an  appropriately  selected  local 
maximum  is  usually  used  Instead  of  the  global  maximum. 

The  EM  algorithm  is  an  iterative  algorithm  that  computes  stationary 
points  of  the  posterior  likelihood  function  (10)  without^taking  gradients,  or 
derivatives.  It  begins  with  an  initial  guess,  say  A,  for  the  optimum 
parameter  vector,  and  each  iteration  gives  a  new  parameter  vector,  say  A, 
that  1s  guaranteed  to  increase  the  value  of  the  posterior  likelihood  (10) 
unless  A  is  a  stationary  value  for  !?,  in  which  case  A  =  Consequently,  if 
the  EM  iterates  are  bounded,  the  EM  algorithm  must  converge  to  a  stationary 


I  -  689 


-297- 


point  of  9.  In  practice,  stationary  points  found  by  the  EM  algorithm  are 
usually  also  points  of  local  maxima.  By  restarting  the  the  EM  algorithm  with 
different  initial  guesses  X,  and  choosing  the  best  of  the  local  maxima  so 
obtained,  a  satisfactory  maximum  likelihood  estimate  for  A  can  be  found. 
Convergence  rates  of  the  EM  algorithm  are  discussed  in  [4]. 


4.  CONCLUDING  REMARKS 

Training  the  NN  architecture  described  in  this  paper  can  be  performed 
using  readily  available  computing  resources.  This  is  a  significant  feature, 
and  it  is  a  consequence  of  the  general  statistical  methodology  described 
herein.  Explicit  formulae  for  nodal  thresholds  and  Interconnection  weights 
in  terms  of  the  estimated  mean  vectors  and  covariance  matrices  of  the 
Gaussian  mixture  components  are  given.  These  formulae  eliminate  the  need  to 
train  the  NN  explicitly,  and  refocus  the  training  effort  directly  onto 
established  statistical  algorithms  for  estimating  mixture  PDFs. 

Alternative  NN  implementations  for  Gaussian  PDFs  are  possible.  In 
addition,  NNs  can  be  designed  for  centrally  symmetric  non-Gaussian 
distributions.  Details  are  given  in  [5]  and  [6].  A  maximum  likelihood 
method  for  unsupervised  training  is  also  discussed  in  [7]. 

The  EM  algorithm  is  a  normalized  counting  algorithm,  and  it  is  readily 
incorporated  into  adaptive  schemes  for  updating  the  NN  weights  and  thresholds 
as  new  data  is  added  to  the  training  set  or,  alternatively,  as  old  data  is 
deleted.  The  advantages  of  such  adaptive  training  methods  for  NN 
applications  are  potentially  significant.  Feedback  from  the  training  data 
into  the  defining  parameters  of  the  NN  described  in  the  paper  is  greatly 
facilitated  by  the  computational  simplicity  of  the  EM  training  algorithm. 


REFERENCES 

1.  B.  W.  Llndgren,  Statistical  Theory.  Third  Ed.,  Macmillan,  New  York,  1976. 

2.  A.  P.  Dempster,  N.  M.  Laird,  and  D.  B.  Rubin,  "Maximum  Likelihood  From 
Incomplete  Data  via  the  EM  Algorithm  (with  discussion),"  Journal  of  the  Royal 
Statistical  Society,  Series  B,  39(1977),  1-38. 

3.  D.  F.  Specht,  "Probabilistic  Neural  Networks  For  Classification, 
Mapping,  or  Associative  Memory,"  Proceedings  of  the  IEEE  International 
Conference  on  Neural  Networks.  July  1988,  Vol.  1,  pp.  525-532. 

4.  G.  J.  Me Lachlan  and  K.  E.  Basford,  Mixture  Models.  Inference  and 
Applications  to  Clustering.  Marcel  Dekker,  New  York,  1988. 

5.  R.  L.  Streit,  "A  Dual  Neural  Network  Architecture  For  Gaussian 
Components  of  a  Mixture  Density  Function,"  Naval  Underwater  Systems 
Center  Invention  Disclosure,  24  March  1989. 

6.  R.  L.  Streit,  "Neural  Network  Architectures  For  Non-Gaussian  Components 
of  a  Mixture  Density  Function,"  Naval  Underwater  Systems  Center  Invention 
Disclosure,  24  March  1989. 

7.  R.  L.  Streit,  "A  Neural  Network  For  Maximum  Likelihood  Classification 
With  Supervised  And  Unsupervised  Training  Capability,"  Naval  Underwater 
Systems  Center  Invention  Disclosure,  24  March  1989. 


I  -  690 


-298- 


Maximum  Likelihood  Training 
Of  Probabilistic  Neural  Networks 


R.  L.  Streit  and  T.  E.  Luginbuhl 


-299- 


Abstract 


A  maximum  likelihood  method  is  presented  for  training 
probabilistic  neural  networks  (PNNs),  using  a  Gaussian  kemal,  or 
Parzen  window.  The  proposed  training  algorithm  enables  general 
nonlinear  discrimination  and  is  a  generalization  of  Fisher’s  method 
for  linear  discrimination.  Important  features  of  maximum 
likelihood  training  for  PNNs  are  (1)  it  economizes  the  well  known 
Parzen  window  estimator  while  preserving  feed-forward  NN 
architecture,  (2)  it  utilizes  class  pooling  to  generalize  classes 
represented  by  small  training  sets,  (3)  it  gives  smooth  discriminant 
boundaries  that  often  are  "piece-wise  flat"  for  statistical 
robustness,  (4)  it  is  very  fast  computationally  compared  to 
back-propagation,  and  (5)  it  is  numerically  stable.  The 
effectiveness  of  the  proposed  maximum  likelihood  training 
alogrithm  is  assessed,  using  nonparametric  statistical  methods  to 
define  tolerance  intervals  on  PNN  classification  performance. 


-301- 


1  Introduction 


Classification  is  the  following  decision  problem:  given  an  input  vector  X,  decide 
to  which  of  several  known  classes  the  input  X  belongs.  The  classes  are  assumed 
to  be  mutually  exclusive  and  exhaustive.  Useful  characterizations  of  the  classes 
are  assumed  to  be  either  unknown  or  unavailable  and  must  be  estimated  from  a 
given  collection  of  labelled  training  samples  (i.e.,  input  vectors  corresponding  to 
each  class).  The  absence  of  a  priori  class  characterizations  is  the  major  difficulty  in 
classification. 

The  training  samples  available  for  each  class  reflect  the  intrinsic  variability  of 
the  class.  Measurement  errors  are  normally  present  in  the  training  samples  also, 
but  such  errors  are  subsumed  here  in  the  guise  of  class  variability.  Class  variability 
models  developed  in  this  paper  are  based  on  the  following  fundamental  assumptions: 

1 .  Each  class  is  a  multivariate  random  variable  with  a  continuous  class  conditional 
probability  density  function  (PDF). 

2.  Every  input  vector  X  is  a  realization  of  one  of  the  classes. 

3.  Each  vector  in  the  training  sample  set  T  is  a  realization  of  the  random  variable 
corresponding  to  its  class  label. 

From  the  first  two  assumptions  it  follows  that  the  well  known  Bayesian  classifier  [1, 
Chapter  13]  is  the  optimum  classifier  in  the  sense  of  minimizing  the  overall  misclas- 
sification  risk.  The  implemention  of  a  homoscedastic  Gaussian  mixture  (defined  in 
Section  II)  approximation  to  the  optimum  classifier  in  a  probabilistic  neural  network 
(PNN)  structure  is  discussed  in  Section  II. 

Obtaining  meaningful  class  conditional  PDF  estimates  for  classes  represented 
by  only  a  few  samples  in  the  training  set  T  is  a  difficult  and  thorny  problem,  but 
one  that  occurs  often  in  practice.  We  treat  the  small  sample  size  problem  by  a  new 
sample  pooling  method  that  generalizes  a  classical  statistical  technique  due  to  Fisher 

1 


-303- 


[2,  Chapter  4].  We  refer  to  this  new  method  as  Generalized  Fisher  (GF)  training, 
and  it  yields  (local)  maximum  likelihood  estimates  of  the  class  conditional  PDFs. 
GF  training  is  discussed  in  Section  III  and  derived  in  the  Appendix.  A  discussion 
of  the  training  of  a  priori  class  probabilities  and  misclassification  costs  is  given  in 
Section  IV.  Examples  of  GF  training  on  small,  moderate,  and  large  size  training 
sets  T  are  presented  in  Section  V,  and  the  effectiveness  of  GF  training  for  these 
examples  is  assessed  in  Section  VI  using  a  nonparametric  statistical  method  called 
tolerance  intervals.  Concluding  remarks  are  given  in  Section  VII. 

2  Neural  Network  Implementation  of  Mixture 
Gaussian  PNNs 

The  purpose  of  this  section  is  to  show  that  a  four  layer  feed-forward  PNN  using 
a  general  Gaussian  kernal,  or  Parzen  window,  can  implement  exactly  the  general 
h^mos cedastic  Gaussian  mixtures  used  in  this  paper  to  approximate  the  optimum 
classifier.  Maximum  likelihood  training  of  the  PNN  is  discussed  below  in  Sections  III 
and  IV.  The  structure  of  the  required  PNN  is  represented  in  Figure  1.  Each  compo¬ 
nent  of  this  PNN  has  a  specific  interpretation  and,  moreover,  all  the  interconnection 
weights  and  nodal  thresholds  are  given  explicitly  by  mathematical  expressions  in¬ 
volving  the  defining  parameters  of  the  mixture  Gaussian  PDF  estimates  and  the  a 
priori  class  probabilities  and  misclassification  costs. 

Let  N  denote  the  dimension  of  the  input  vector  X,  and  let  M  denote  the  number 
of  different  class  labels  in  the  training  set  T.  For  j  =  let  Gj  >  1  denote 

the  total  number  of  different  components  in  the  j-th  class  mixture  PDF.  Let  Pij(X) 
denote  the  multivariate  PDF  of  the  i-th  component  in  the  mixture  for  class  j,  and 
let  Xjj  denote  the  proportion  of  component  i  in  class  j.  The  “within-class”  mixing 


2 


-304- 


(1) 


proportions  x,j  are  non-negative  and  satisfy  the  equations 

=  lj  j  = 

1=1 

The  PDF  of  class  j,  denoted  by  fj(X),  is  approximated  by  a  general  mixture  PDF, 
denoted  by  gj(X),  that  is, 

fj{X)  «  gj{X)  =  £  *ijPa(X),  j  ml,...,  M.  (2) 

i=i 

In  this  paper,  only  multivariate  homoscedastic  Gaussian  mixtures  are  considered, 
hence  Pij(X)  has  the  form 

Pij(X)  =  (2x)-^2|E|-1/Jexp  {-|(AT  -  p^E^X  -  *,)}  ,  (3) 

where  puj  is  the  mean  vector  and  E  is  the  positive  definite  covariance  matrix  of 
Pij(X),  and  where  superscript  t  denotes  transpose.  The  covariance  matrix  E  is 
chosen  independent  of  the  class  index  j  and  the  component  index  i  for  reasons 
discussed  in  Section  III.  The  results  presented  in  this  Section  are  readily  extended 
to  the  more  general  (heteroscedastic)  case  of  different  covariance  matrices.  For 
details,  see  [3]. 

Let  a/  denote  the  a  priori  probability  of  class  l,  that  is,  {a/}  are  the  “between- 
class”  mixing  proportions.  Let  c,/  denote  the  loss  associated  with  classifying  an 
input  vector  X  into  class  j  when  the  correct  decision  should  have  been  class  l.  The 
risk  Pj(X)  of  classifying  the  input  X  into  class  j  is  the  expected  loss,  so  that 

Pj{X)  «  'Z,cjla,g,(X).  (4) 

l=i 

The  decision  risk  pj(X)  is,  thus,  approximated  by  a  mixture  of  Gaussian  PDFs,  as 
is  seen  by  substituting  (2)  into  (4).  The  minimum  risk  decision  rule  is  to  classify  X 
into  that  class  j  having  the  minimum  risk,  that  is,  j  =  arg  min{pj(X)} .  The  decision 
j  is  the  optimum  Bayesian  classification  decision  [1,  Chapter  13]  if  gj(X)  =  fj(X) 
for  all  j,  that  is,  provided  the  approximation  (2)  is  an  equality.  (Ties  for  minimum 

3 


-305- 


risk  occur  with  probability  zero,  so  they  can  be  decided  arbitrarily  in  practice.)  The 
simple  task  of  selecting  the  minimizing  index  j  can  be  performed  in  many  ways,  and 
it  has  been  pointed  out  [4]  that  a  NN  structure  can  be  used  for  this  task  if  desired. 

The  nodes  in  each  of  the  four  layers  play  specific  roles  in  the  PNN.  The  output  of  a 
fourth  layer  node  is  the  risk  pj(X)  of  choosing  class  j,  as  given  by  the  approximation 
(4).  The  fourth  layer  therefore  requires  as  many  nodes  as  classes,  namely  M.  A 
fourth  layer  node  is  conceptually  equivalent  to  a  decision  risk.  The  output  of  a  third 
layer  node  is  gj(X ),  the  approximate  class  conditional  PDF  given  by  equation  (2). 
A  node  is  needed  for  each  class,  so  there  are  M  third  layer  nodes.  A  third  layer 
node  is  conceptually  equivalent  to  a  statistical  hypothesis.  The  output  of  a  second 
layer  node  is  the  likelihood  Pij(X)  of  a  component  in  a  class  mixture.  A  second  layer 
node  is  needed  for  each  Gaussian  component  in  each  class,  so  there  are 

G  =  G\  +  Gi  + . . .  +  Gm 

second  layer  nodes.  A  second  layer  node  is  conceptually  equivalent  to  a  multivariate 
Gaussian  random  variable.  A  first  layer  node  is  needed  for  each  degree  of  freedom  in 
the  x2  distributed  exponent  (the  expression  in  braces  in  equation  (3))  of  every  mul¬ 
tivariate  Gaussian  component.  There  are  N  degrees  of  freedom  and  G  components, 
so  there  are  NG  nodes  in  the  first  layer. 

The  activation  function  appropriate  for  a  node  depends  upon  the  layer  in  which 
it  resides.  All  fourth  and  third  layer  nodes  use  the  identity  function  with  a  zero 
threshold,  or  bias.  The  second  layer  nodes  use  the  function  exp(— z/2)  with  a  zero 
bias.  The  first  layer  nodes  all  use  the  activation  function  jz|,  but  the  biases  vary 
from  node  to  node  across  the  layer.  Explicitly,  the  first  layer  biases  are  given  by 

Tijk  =  j  =  l,...,Af,  k  =  1,  —  ,7V, 

where  L  is  any  square  root  matrix  factor  of  the  covariance  matrix  E,  that  is  E  =  LLl. 
The  Cholesky  factor  is  one  such  square  root.  The  bias  rtJ*  depends  on  the  destination 

4 


-306- 


second  layer  node  via  the  mean  vector  pij.  Further  discussion  of  the  activation 
functions  and  biases  of  the  first  three  layers  of  the  PNN  are  given  in  [3]. 

The  description  of  the  trained  PNN  is  completed  by  defining  interconnection 
weights  between  the  layers,  and  giving  their  specific  roles  in  the  PNN.  We  begin  with 
the  top  two  layers  and  work  down  the  NN.  The  interconnection  weight  between  node 
/  in  the  fourth  layer  and  node  j  in  the  third  layer  is  the  product  a/ty.  These  weights 
characterize  decision  risk  formation.  The  interconnection  weight  between  a  third 
layer  node  (class  mixture)  and  a  second  layer  node  (Gaussian  component)  is  zero  if 
the  component  does  not  belong  to  the  class  mixture,  and  is  the  mixing  proportion 
7r,j  if  it  is  component  i  of  the  class  j  mixture.  These  weights  characterize  mixture 
formation.  The  interconnection  weight  between  a  second  and  a  first  layer  node  is 
either  1  or  0,  depending  on  whether  or  not  a  given  degree  of  freedom  (first  layer 
node)  belongs  to  a  given  Gaussian  random  variable  (second  layer  node).  These 
weights  characterize  \2  random  variable  formation.  Finally,  the  interconnection 
weights  between  the  first  and  input  layers  are  given  by  the  entries  of  the  inverse  of 
the  square  root  matrix  factor  L  of  the  covariance  matrix  E.  There  are  a  total  of  G 
components,  and  L~l  is  N  x  N,  so  this  gives  GN2  interconnection  weights.  If  L  is 
chosen  to  be  the  Cholesky  factor  of  E,  then  L~l  is  lower  triangular  and  nearly  half 
the  weights  between  the  first  and  input  layers  are  zero.  Alternatively,  the  matrix  L 
can  be  chosen  so  that  it  characterizes  the  discrete  Karhunen-Loeve  transformation 
corresponding  to  E,  that  is,  L~l  —  A-1/2!/*,  where  E  =  UAU*  is  the  singular  value 
decomposition  of  E.  In  this  case,  the  sparsity  of  L~x  is  not  immediately  evident.  A 
more  detailed  description  of  interconnection  weights  is  given  in  [3]. 


5 


-307- 


3  Generalized  Fisher  Training  of  Probabilistic 
Neural  Networks 


The  PNN  proposed  by  Specht  [5]  is  a  special  case  of  the  PNN  described  in  Section 
II,  as  is  seen  by  setting  the  costs  c«  =  0  for  all  l  and  c3i  =  1  for  j  ^  /,  and  noting  that 
the  fourth  layer  is  essentially  superfluous  in  this  case.  Specht’s  PNN  implements  the 
Parzen  window  PDF  estimator  [6]  using  the  so-called  product  Gaussian  (i.e.,  uncor¬ 
related  Gaussian)  window.  The  Parzen  window  sets  the  interconnection  weights  and 
nodal  activation  functions.  Specht’s  PNN  is  thus  a  three  layer  feed-forward  NN  that 
uses  mixtures  of  uncorrelated  Gaussians  to  estimate  the  class  conditional  PDFs. 

Specht’s  PNN  is  an  excellent  tool  for  initial  exploration  of  new  large  training  sets. 
Nonetheless,  its  usefulness  in  practice  is  limited  by  two  factors.  Firstly,  because  it 
is  based  on  the  Parzen  window  estimator,  the  total  number  G  of  Gaussian  compo¬ 
nents  must  equal  the  number  of  samples  in  the  training  set  T.  Therefore,  it  requires 
large  amounts  of  data  storage  when  extensive  training  sets  are  available.  Secondly, 
an  intrinsic  smoothing  parameter  must  be  estimated  on  the  basis  of  classification 
performance.  Since  robust  estimates  of  classification  performance  are  difficult  to 
establish  for  small  sample  size,  estimates  of  the  smoothing  parameter  may  be  un¬ 
reliable  in  practice.  Both  factors  can  often  be  mitigated  by  heuristics  suited  to  the 
particular  application.  The  contribution  of  generalized  Fisher  (GF)  training  is  that 
it  successfully  treats  both  these  problems  without  the  need  of  heuristics. 

The  GF  trained  PNN  requires  significantly  fewer  nodes  and  interconnection 
weights  than  Specht’s  PNN  in  most  problems  of  practical  interest.  A  careful  com¬ 
parison  of  the  two  architectures  below  the  third  layer  shows  that  the  GF  trained 
PNN  is  more  efficient  in  both  nodes  and  weights  if 


G 


(5) 


where  /?  is  the  sparseness  index  of  the  inverse  square  root  matrix  factor  of  the 


6 


-308- 


covariance  matrix  £,  that  is,  L~l  has  0N2  nonzero  entries.  L~l  is  fully  dense  if 
0  —  1  and  least  dense  (diagonal)  if  0  —  l/N.  By  choosing  L  to  be  the  Cholesky 
factor,  the  index  0  can  always  be  made  at  least  as  small  a s  (N  +  l)/(2 N)  ss  1/2. 
From  inequality  5,  it  is  clear  that  the  trained  PNN  is  most  effective  in  reducing  node 
and  weight  requirements  in  large  training  set  problems.  For  small  training  sets,  the 
reduction  in  the  number  of  nodes  and  weights  depends  on  the  sparseness  index  0  of 
L~\ 


We  begin  the  discussion  of  GF  training  by  reviewing  a  classical  treatment  of  the 
two  class  discrimination  problem:  Fisher’s  linear  discriminant  (FLD).  FLD  is  based 
on  the  premise  that  both  classes  are  multivariate  Gaussian  random  variables  with  a 
common  covariance  matrix  £,  but  different  mean  vectors  and  The  available 
training  set  T  is  assumed  to  be  correctly  labelled,  and  we  write  T  =  UT2,  where 
T j  denotes  the  subset  of  T  with  class  label  j.  The  sample  means 


1 


A)  n/t r\  ^  3  1)2, 

Tr\Jj)  xeTj 


(6) 


estimate  the  means  fii  and  fi2,  where  #(■)  denotes  the  cardinality  (number  of  sam¬ 
ples)  of  a  training  set.  The  covariance  matrix  £  is  estimated  by  Fisher’s  within-class 
scatter  matrix 


Try1  )  j=iX€Tj 

The  estimation  error  for  E  is  reduced  by  pooling  the  sample  data,  i.e.,  by  using 
all  samples  in  the  training  set  T.  Given  these  estimates,  the  log-likelihood  ratio  is 
evaluated  for  an  unknown  vector  X  to  be  classified.  The  classification  decision  is 
obtained  by  comparing  this  ratio  to  an  appropriate  threshold.  Contours  of  constant 
log-likelihood  ratio  are  in  this  case  hyperplanes  in  the  feature  space  (i.e.,  R^),  and 
the  hyperplane  corresponding  to  the  threshold  value  is  the  FLD. 

The  FLD  is  known  to  be  robust  in  the  sense  that  linearly  separable  classes  are 
often  successfully  discriminated  in  practice  when  neither  class  is  truly  Gaussian. 
Even  when  both  classes  are  Gaussian  but  have  different  covariance  matrices,  some 


7 


-309- 


authors  have  observed  that  the  FLD  is  often  a  better  classifier  than  the  optimum 
quadratic  discriminator.  For  fixed  training  set  size,  the  increased  estimation  error 
in  the  two  covariance  matrices  that  results  from  not  pooling  the  training  samples  is, 
presumably,  the  cause  of  the  relatively  greater  robustness  of  the  FLD  in  this  case. 

Pooling  the  training  samples  provides  a  natural  way  of  developing  PDF  estimates 
for  classes  that  have  few  samples  in  the  training  set  T .  In  applications  where  samples 
from  different  classes  have  broadly  similar  correlational  structure,  it  is  reasonable 
to  pool  the  training  samples  when  the  sample  size  is  small.  Moreover,  in  practice, 
in  the  absence  of  a  priori  information  to  the  contrary,  it  is  probably  inevitable  that 
pooling  will  be  used  to  generalize  small  sample  set  classes.  Pooling  is  the  basic 
strategy  adopted  in  this  paper. 

GF  training  is  a  generalization  of  the  FLD  methodology.  It  uses  a  homoscedastic 
“mixture  of  mixtures”  assumption  to  formulate  a  posterior  likelihood  function  C  for 
the  entire  training  set  T.  An  ordinary  homoscedastic  mixture  PDF  is  a  mixture 
in  which  the  components  share  a  common  covariance  matrix  E.  By  the  term  ho¬ 
moscedastic  mixture  of  mixtures,  we  mean  that  a  common  covariance  matrix  E  is 
used  within  each  class  mixture  and  also  across  all  classes  represented  in  the  training 
set  T.  The  likelihood  function  C  is  highly  nonlinear  in  the  defining  parameters  of 
the  mixture  of  mixtures,  and  it  is  not  generally  possible  to  factor  it  into  terms  de¬ 
pending  on  only  one  class  label.  Maximum  likelihood  parameter  estimates  for  each 
class  PDF  are  therefore  jointly  dependent  on  training  samples  across  all  classes. 

Maximum  likelihood  parameter  estimates  for  the  mixture  of  mixtures  are  ob¬ 
tained  numerically  by  utilizing  an  algorithm  based  on  the  Expectation-Maximization 
(EM)  method  [7].  The  derivation  of  the  GF  training  algorithm  is  given  in  the  Ap¬ 
pendix.  The  remainder  of  this  Section  is  devoted  to  formulating  the  likelihood 
function  C,  to  stating  the  most  important  properties  of  GF  training,  and  to  inter¬ 
pretating  its  maximum  likelihood  solution.  Equations  (74)  -  (78)  summarize  the 
GF  algorithm  iteration  from  step  n  to  step  n+1. 

8 


-310- 


The  parameters  A  defining  the  homoscedastic  mixture  of  mixtures  comprises  the 
following  variables: 

•  aj  =  the  a  priori  probability  of  class  j, 

•  nj  =  the  mixing  proportion  of  component  *  in  class  j, 

•  fiij  =  the  mean  vector  of  component  i  in  class  j,  and 

•  E  =  the  common  covariance  matrix  of  all  Gaussians. 

Thus,  A  comprises  a  total  of  M  +  G  +  NG  +  N2  real  variables,  though  not  all  of 
them  are  independent  (e.g.,  E  is  symmetric  and  mixing  proportions  sum  to  1).  For 
j  =  let 

A  j  =  {^ijl  Pijy  ^}i=l 

denote  the  parameters  defining  the  homoscedastic  Gaussian  mixture  for  class  j.  The 
labelled  training  set  T  is  partitioned  into  the  disjoint  subsets 

r  =  r1ur2u...urM, 

where  T j  comprises  those  samples  in  T  with  class  label  j.  The  posterior  likelihood 
function  £(T|A)  is  defined  on  T  by  assuming  that  the  samples  in  Tj  are  independent 
for  each  j,  and  that  the  class  labels  are  assigned  correctly.  From  these  assumptions, 
it  follows  from  equation  (23)  in  the  Appendix  that  the  GF  log- likelihood  function  is 

log£(7|A)  =  £  logfo^priA,)], 

j=i  xer, 

where  the  function  gj{X\\})  is  identical  to  the  class  PDF  gj{X)  defined  by  equation 
(2).  Estimating  the  parameter  set  A  is  the  central  task  of  GF  training. 

The  GF  training  algorithm  converges  to  a  local  maximum  likelihood  estimate 
A  mi  for  A.  The  EM  method  for  mixtures  is  derived  in  [8]  for  one  class,  i.e.,  the 
special  case  M  =  1.  It  is  extended  to  the  GF  likelihood  function  in  the  Appendix. 

9 


-311- 


GF  training  is  an  iterative  procedure  that  computes  stationary  points  of  the  posterior 
likelihood  function  C  without  taking  gradients  or  derivatives.  It  begins  with  an 
initial  guess,  say  A,  for  the  optimum  parameters,  and  each  iteration  gives  a  new 
parameter  estimate,  say  A+,  that  is  guaranteed  to  increase  the  value  of  the  posterior 
likelihood  function  C  unless  A  is  a  stationary  value  of  C,  in  which  case  A  =  A+. 
Consequently,  if  the  GF  algorithm  iterates  are  bounded  above,  as  they  typically  are 
in  applications  (see  the  Theorem  in  the  Appendix),  the  GF  training  algorithm  must 
converge  to  a  stationary  point.  In  practice,  stationary  points  of  C  are  also  points  of 
local  maxima.  By  restarting  the  GF  training  algorithm  with  different  initial  guesses 
A,  and  choosing  the  best  of  the  local  maxima  so  obtained,  a  satisfactory  maximum 
likelihood  estimate  for  A  can  be  found. 

An  intuitive  understanding  of  the  shape  of  the  decision  (discriminant)  surface 
can  be  gained  in  certain  instances.  Consider  the  two  class  problem.  The  FLD  always 
has  a  linear  decision  boundary,  as  remarked  above,  but  GF  training  will  not  result  in 
linear  decision  boundaries  in  general.  The  nonlinear  decision  boundary  will  be  very 
flat  (linear)  wherever  the  input  vector  X  lies  “close”  to  only  one  component  in  each 
of  the  two  classes.  The  reason  is  that  the  log-likelihood  ratio  behaves  locally  like  the 
FLD  in  this  case.  Intuitively,  then,  the  GF  decision  surface  comprises  several  nearly 
flat  sections  that  are  joined  together  by  smoothly  varying  transitional  surfaces.  This 
intuitive  image  suggests  that  GF  training  may  be  robust  against  overtraining  on  the 
sample  set  T  when  G  is  in  some  sense  small  compared  to  the  size  of  the  training  set 
T.  The  image  also  suggests  a  “decision  directed”  method  for  obtaining  piecewise 
linear  discriminants,  and  this  is  discussed  briefly  in  Section  VII. 

Finally,  GF  training  has  an  important  translation  property  that  shows  clearly 
that  GF  training  is  based  on  PDF  estimation  and  not  on  class  discrimination.  To 
be  explicit,  for  j  =  let  'Jj  denote  the  training  set  Tj  after  translation  by 

a  given  vector  <f>j,  and  let  denote  the  union  of  all  sets  Vj.  Suppose  GF  training 
applied  to  the  translated  training  set  'P  converges  to  the  parameter  set  A#,  and 

10 


-312- 


that  GF  training  applied  to  the  set  T  converges  to  the  parameter  set  A7- ■  Then 
the  parameter  sets  A#  and  A r  are  translates,  that  is,  they  are  identical,  except  that 
the  mean  vector  in  A*  of  the  i-th  component  in  the  j-th  mixture  is  fUj  -f  <f>j,  where 
Hij  is  the  corresponding  mean  vector  in  A 7.  This  result  assumes  that  the  initial 
parameter  sets  are  also  translates.  It  follows  from  the  translation  property  that 
the  estimated  class  conditional  PDFs  are  independent  of  the  between-class  sample 
separations.  Classification  performance  of  GF  training  is  therefore  determined  by 
two  independent  factors:  (1)  the  separation  of  the  class  means,  and  (2)  the  detailed 
shape  of  the  individual  class  conditional  PDFs. 

4  Training  A  Priori  Class  Probabilities  and  Mis- 
classification  Costs 

The  GF  training  algorithm  gives  explicit  maximum  likelihood  estimates  for  the  class 
a  priori  probabilities  {cr,}  without  iteration.  From  equation  (74),  the  maximum  like¬ 
lihood  estimate  of  aj  is  aj  =  #(7/)/#(T),  or,  in  words,  a,  represents  the  relative 
abundance  of  class  j  in  the  training  set  T.  Clearly,  the  training  set  T  contains  sig¬ 
nificant  a  priori  class  probability  information  only  if  it  has  been  carefully  compiled. 

Standard  statistical  practice  requires  screening  the  training  set  to  eliminate  out¬ 
liers  and  other  anomalies.  Moreover,  it  may  also  be  necessary  to  screen  the  training 
set  to  ensure  correctly  labelled  samples.  If  careful  attention  is  not  given  to  these 
important  tasks,  the  resulting  training  set  T  will  contain  little  or  no  meaningful 
information  regarding  the  class  a  priori  probabilities.  In  this  case,  the  likelihood 
function  C  must  be  modified  slightly.  It  becomes 

log£(T|A\{aJ})  =  ^  £  \oggAX\Xj) 

>= 1  XZ7} 

where  A  \  {a,}  denotes  the  parameter  set  A  with  {qj}  removed.  It  is  straight 
forward  to  show,  using  the  methods  of  the  Appendix,  that  the  GF  training  algorithm 

11 


-313- 


modified  for  this  likelihood  function  is  identical  to  equations  (75)  -  (78).  The  sole 
difference  is  that  {o^}  are  no  longer  estimated.  We  will  refer  to  both  algorithms  as 
GF  training  algorithms. 

The  preceding  comments  should  not  obscure  the  fact  that  it  is  still  necessary  to 
train  class  a  priori  probabilities  in  applications  in  which  these  quantities  cannot  be 
estimated  from  the  training  set  T.  Although  one  may  resort  to  information  theoretic 
concepts  such  as  entropy  [9],  a  more  appropriate  recourse  for  many  applications  is  to 
exploit  a  priori  information  not  immediately  available  from  within  the  training  set  T, 
as  it  is  defined  in  Section  I.  The  same  is  true  as  well  concerning  the  misclassification 
costs  {c,/}.  Training  these  fundamental  quantities  is  important  if  near-optimum 
classification  performance  is  to  be  attained  in  practice. 

The  essential  difficulty  is  that  the  likelihood  function  C(T |A  \  {a,})  is  indepen¬ 
dent  of  the  misclassification  costs  {cji}  and  the  probabilities  {a,}.  No  maximum 
likelihood  training  algorithm  can  estimate  factors  missing  from  the  fundamental 
likelihood  structure.  Thus,  C  must  be  modified  to  include  dependence  on  a  priori 
information.  This  task  requires  intimate  knowledge  of  the  particular  application 
together,  perhaps,  with  additional  observation  time  history.  Although  time  history 
can  be  included  in  T,  it  is  clear  that  training  {c,;}  and  {a7}  may  require  an  exten¬ 
sive  modification  of  C(T\X  \  {a7})  and  involve  information  and  methods  outside  the 
scope  of  the  present  paper. 

5  GF  Training  Examples 

Three  examples  are  presented  to  illustrate  the  effectiveness  of  GF  training  on  dif¬ 
ferent  size  training  sets.  To  focus  clearly  on  the  small  training  set  problem,  the 
same  classes  are  used  in  all  the  examples,  and  training  is  performed  on  different  size 
subsets  of  the  available  training  set  T.  The  effect  of  using  different  subsets  of  T  on 
classification  performance  is  assessed  in  Section  VI. 

12 


-314- 


Each  example  comprises  three  classes  defined  on  R2,  that  is,  the  dimension  of 
the  input  vector  X  is  N  =  2.  The  samples  in  T  are  measured  data,  not  simu¬ 
lated.  Because  of  the  way  T  was  gathered  and  screened  in  the  application,  the 
relative  abundance  of  sample  data  for  each  class  does  not  reflect  the  a  priori  class 
probabilities  {o^}.  The  class  prior  probabilities  ctj  are  chosen  equal  to  1/3.  The 
misclassification  costs  c,;  are  defined  by  equation  (4)  and  are  chosen  equal  to  0  if 
l  =  j  and  equal  to  1/2  if  /  ^  j.  The  optimum  classifier  is,  with  these  choices, 
equivalent  to  the  well  known  maximum  likelihood  classifier. 

Given  model  orders  {Gj},  GF  training  is  defined  for  these  examples  by  equations 
(75)-(78).  The  best  choice  of  Gj,  the  number  of  (bivariate)  Gaussian  components  in 
the  mixture  PDF  for  class  j,  is  a  model  order  selection  problem,  and  its  solution  is 
application  dependent.  Typically,  Gj  should  be  chosen  as  small  as  possible  without 
losing  classification  performance.  The  study  of  the  order  selection  problem  is  greatly 
facilitated  by  the  numerical  robustness  of  GF  training;  however,  this  important 
problem  is  outside  the  scope  of  the  present  paper.  Overtraining  is  the  only  aspect 
of  this  problem  upon  which  we  will  comment. 

GF  training  requires  initial  values  for  the  class  mixture  parameters.  For  class  j, 
the  initial  mixing  proportions  7r,-j  were  chosen  equal  to  1  /Gj.  The  initial  covariance 
matrix  was  the  within-class  scatter  matrix  (cf.  equation  (7)  for  two  classes)  for  the 
training  set.  The  initial  mean  vectors  for  Example  1  were  chosen  randomly  within 
a  square  containing  the  appropriate  class  samples.  The  initial  mean  vectors  for 
Example  2  were  a  subset  of  those  for  Example  1,  and  Example  2  contained  those 
for  Example  3.  No  effort  was  made  to  restart  GF  training  to  determine  if  the  local 
maximum  likelihood  solutions  obtained  were  globally  optimum. 

To  facilitate  the  discussion  of  the  examples,  we  define  the  decision  risk  by 

p(X)  =  min{Pl(X),p2(X),p3(X)},  (8) 

where  the  decision  risks  {pj(X)}  are  approximated  by  the  right  hand  side  of  equation 

13 


-315- 


(4).  The  decision  assurance  is  defined  by 

6(X)  =  max  {<*iji(X),  a2g2(X),  a3$3(X)}  ,  (9) 

where  the  estimated  class  PDFs  {g,(X)}  are  defined  by  equation  (2).  The  optimum 
class  decision  is  identified  by  the  index  jm,  where 

jm  =  argmin{pi(X),/j2(X),/j3(X)}.  (10) 

Because  of  the  particular  choice  of  costs  and  class  priors,  we  also  have  the  equivalent 
expression  j*  =  arg  m&x{gi(X),g2(X),g3(X)}. 

GF  training  is  very  efficient  in  the  examples  presented  in  detail  below.  The  con¬ 
vergence  criteria  required  a  relative  increase  of  10-4  in  the  log- likelihood  function, 
that  is,  iteration  ceased  when  the  current  value  of  log  C  increased  by  a  factor  less 
than  or  equal  to  1  -f- 10-4  times  the  previous  value  of  log  C.  GF  training  for  Example 
1  converged  in  53  iterations  and  used  approximately  two  minutes  of  wall-clock  time 
(including  all  file  handling  and  I/O  operations).  Example  2  converged  in  26  itera¬ 
tions  in  about  5  seconds,  while  Example  3  converged  in  18  iterations  in  well  under 
one  second.  The  GF  algorithm  is  implemented  in  single  precision  FORTRAN  on  a 
Sun  SPARC-station  I. 

5.1  Example  1:  Large  size  training  set,  X 

The  training  setT  =  TiUTjUT3  comprises  1960,  720,  and  500  two-dimensional 
training  samples  in  classes  1,  2,  and  3,  respectively.  Example  1  trains  using  all  the 
available  samples  in  T  with  the  choices  G\  =  8,  G2  —  4,  and  G3  =  2  for  the  number 
of  class  mixture  components.  This  choice  for  {Gj}  reflects  the  relative  diffuseness 
(as  compared  to  an  uncorTelated  Gaussian  distribution)  of  the  training  samples  in 
the  various  classes.  The  trained  mixing  proportions  and  mean  vectors  of  the  class 


14 


-316- 


mixture  PDFs  are  listed  in  Table  1.  The  inverse  of  the  trained  covariance  matrix  is 


E"1 =  10~3  x 


17.590  -0.76978 

-0.76978  6.0786 


The  eigenvalues  of  E  are  56.6850  and  165.911,  so  E  is  numerically  stable. 


Class  1 

Class  2 

Class  3 

component 

number 

X 

A* 

X 

A* 

X 

A* 

1 

0.196423 

85.165 

111.075 

■ 

58.643 

11.885 

■ 

50.180 

49.849 

2 

0.174009 

115.670 

43.677 

0.227895 

75.009 

18.953 

0.482362 

50.180 

49.849 

3 

■ 

115.499 

79.179 

0.182448 

38.771 

10.130 

4 

0.144088 

98.808 

88.775 

0.134909 

16.085 

6.652 

5 

0.111436 

114.690 

56.922 

6 

0.091604 

99.392 

144.360 

7 

0.073182 

83.966 

57.609 

8 

0.050569 

117.437 

123.854 

Table  1:  Mixing  Proportions  and  Mean  Vectors  for  Example  1 


Figure  2  is  a  scatter  plot  of  the  training  set  T,  together  with  a  graph  of  the  three- 
class  discriminant  function  obtained  after  GF  training  is  completed.  The  graph  of 

15 


-317- 


the  discriminant  function  is  the  boundary  line  between  the  regions  of  the  (input) 
plane  that  map  into  the  three  different  classes  under  the  optimum  decision  rule 
(10).  Inspection  of  Figure  2  indicates  that  good  generalization  of  the  training  set  T 
has  taken  place.  Overtraining  has  not  occurred,  since  overtraining  is  characterized 
by  highly  convoluted  discriminant  curves  and  the  nonlinear  discriminant  curves  in 
Figure  2  are  smooth. 

Figures  3,  4,  and  5  depict  the  level  curves,  or  contours,  of  the  estimated  class 
PDFs.  The  likelihood  levels  corresponding  to  the  contours  in  these  and  subsequent 
figures  are  given  in  decibels  referenced  to  the  maximum  PDF  level,  or  dB//max. 
Generally,  to  plot  a  function  in  dB//max,  it  is  normalized  by  its  maximum  value, 
and  then  10  times  its  base  10  logarithm  is  taken.  Thus,  -20  dB//max  is  equivalent 
to  a  level  that  is  10"2  times  the  referenced  maximum  PDF  level.  The  use  of  decibels 
is  justified  by  the  dynamic  range  of  the  likelihood  functions  involved.  The  maximum 
values  of  the  class  PDFs  are  4.29  x  10-4,  7.91  x  10-4,  and  16.41  x  10~4  for  classes 
1,  2,  and  3,  respectively.  Class  3  has  the  largest  maximum  because  it  has  the  most 
compact  PDF. 

The  large  x  marked  on  Figures  3-5  is  the  approximate  location  of  the  point  of 
maximum  likelihood.  The  locations  of  the  mean  vectors  of  the  trained  Gaussian 
components  are  marked  with  squares.  Note  that  in  class  1  the  maximum  likelihood 
does  not  occur  at  the  mean  vector  of  one  of  the  Gaussian  components. 

Scatter  plots  of  the  training  data  have  been  superimposed  on  the  contours  in 
Figures  3-5.  Sample  data  are  marked  with  simple  dots.  In  each  class  virtually  all 
the  training  data  lies  within  the  -20  dB//max  contour.  Since  the  -20  dB//max 
contours  of  the  classes  intersect  and  their  maxima  do  not  greatly  differ,  perfect 
separation  of  the  three  classes  is  not  achieved. 

Although  two  Gaussians  were  permitted  for  class  3,  GF  training  merged  them  by 
superimposing  their  means,  as  is  seen  from  Table  i.  Merging  indicates  that  class  3  is 
overmodeled,  that  is,  G3  =  2  is  too  large.  The  common  covariance  matrix  structure 

16 


-318- 


is  easily  seen  in  the  PDF  level  curves  for  class  3,  depicted  in  Figure  5.  Fitting  a 
single  Gaussian  to  only  class  3  samples  would  give  a  slightly  different  covariance 
matrix.  The  difference  is  due  to  pooling  samples  across  classes,  i.e.,  the  covariance 
matrix  E  depends  jointly  on  the  entire  training  set  T,  not  just  the  samples  for  any 
one  class. 

The  decision  risk  p{X)  gives  much  more  insight  into  the  class  structure  than  the 
simple  discriminant  curve  alone.  Figure  6  depicts  the  risk  p{X)  in  dB//max,  where 
the  maximum  risk  is  4.94  x  10-s.  The  region  of  greatest  decision  risk  occurs  between 
classes  2  and  3,  and  this  fact  agrees  very  well  with  the  good  visual  separation  between 
the  scatter  plots  of  class  1  and  the  other  two  classes.  The  risk  p{X)  thus  not  only 
confirms,  but  also  quantifies,  our  intuition  in  the  matter.  Note  that  the  discriminant 
curve  runs  along  the  ridges  of  the  graph  of  p(X). 

The  decision  assurance  6(X)  is  depicted  in  Figure  7.  The  maximum  assurance  is 
5.47  x  10-4.  The  assurance  function  6(X)  is  useful  as  an  indicator  of  the  correctness 
of  the  optimum  decision.  An  outlier  is  easily  classified  by  noting  its  relation  to  the 
discriminant  curve,  and  the  risk  p{X)  associated  with  this  decision  is  very  small, 
as  can  be  seen  from  Figure  6.  Nonetheless,  one  should  not  feel  too  comfortable 
with  any  decision  concerning  an  outlier.  Screening  outliers  is  accepted  statistical 
practice,  and  screening  can  be  facilitated  by  setting  a  threshold  on  the  assurance 
6(X).  If  tf(.X’)  is  not  sufficiently  large,  then  no  classification  decision  is  made.  This 
is  equivalent  to  postulating  an  additional  "null”  class  having  an  appropriate  diffuse 
PDF.  Note  that  the  discriminant  curve  runs  down  the  valleys  of  the  the  graph  of 
6(X). 

Classification  performance  estimates  derived  from  the  training  set  are  optimisti¬ 
cally  biased,  as  is  well  known.  However,  overtraining  has  not  occurred,  so  such 
estimates  should  not  in  this  instance  be  significantly  biased.  Classification  perfor¬ 
mance  estimates  are  given  in  Table  2,  and  are  presented  primarily  for  comparison 
with  the  next  two  examples.  Note  that  the  largest  off-diagonal  entry  of  the  con- 

17 


-319- 


fusion  matrix  corresponds  to  misclassifying  class  2  samples  as  class  3.  This  also 
corresponds  to  the  region  of  greatest  risk  p(X). 


Input  Class 

Sample  Size 

Decision 

Class  1 

Class  2 

Class  3 

Input 

Training 

Class  1 

98.32% 

1.67% 

0.20% 

1960 

1960 

(1927) 

(12) 

(1) 

Class  2 

0.97% 

93.61% 

0.80% 

720 

720 

(19) 

(674) 

(4) 

Class  3 

0.71% 

4.72% 

99.00% 

500 

500 

(14) 

(34) 

(495) 

Table  2:  Confusion  Matrix  for  Example  1 


5.2  Example  2:  Moderate  size  training  set,  T /10 


Example  2  trains  on  the  set  T /10,  a  fixed  (randomly  selected)  subset  of  T  with  196, 
72,  and  50  training  samples  representing  classes  1,  2,  and  3,  respectively.  The  choices 
Gi  —  4,  G-i  =  2,  and  G3  =  1  are  made  in  this  example  to  reflect  the  reduced  training 
set  size,  given  the  choices  of  {G_,}  in  Example  1.  The  trained  mixing  proportions 
and  mean  vectors  of  the  class  mixture  PDFs  are  listed  in  Table  3.  The  inverse  of 
the  trained  covariance  matrix  is 


E"1  =  10"3  x 


12.345  -3.2255 

-3.2255  5.8022 


The  eigenvalues  of  E  are  73.164  and  223.24,  so  E  is  numerically  stable. 

The  discriminant  boundary  and  decision  risk  p(X)  are  depicted  in  Figure  8.  The 
maximum  risk  is  4.23  x  10~5.  The  discriminant  boundary  is  effectively  piecewise 
linear  in  this  example  because  the  reduced  model  orders  {Gj}  make  the  GF  discrim¬ 
inant  more  prone  to  have  locally  flat  behavior,  as  described  above  in  Section  III. 


18 


-320- 


The  risk  function  p{X)  has  two  discernable  peaks  (local  maxima)  that  correspond 
to  binary  decision  problems  (i.e.,  two-class  problems).  There  was  only  one  peak  in 
Example  1. 

The  decision  assurance  in  dB//max  is  depicted  in  Figure  9.  The  maximum  assur¬ 
ance  is  4.15  x  10-4.  The  covariance  matrix  structure,  discernable  in  the  elliptically 
shaped  likelihood  contours  of  Figure  9,  is  slightly  rotated  from  that  of  Example  1, 
but  this  difference  does  not  significantly  alter  the  overall  class  likehood  distributions. 
It  is  interesting  to  note  that  the  point  of  intersection  of  the  three  arms  of  the  GF 
discriminant  lies  in  a  small  valley  (i.e.,  local  minimum). 

Classification  performance  was  estimated  on  the  full  training  set  T.  The  con¬ 
fusion  matrix  is  given  in  Table  4.  As  is  evident  from  Table  4,  GF  training  on  the 
reduced  size  set  T/ 10  gives  excellent  classification  performance.  Note  that  class  3 
is  never  misclassified  as  class  1.  Correct  classification  rates  are  slightly  less  than 
those  for  Example  1,  possibly  because  the  larger  testing  set  has  reduced  the  small 
positive  bias  evident  in  the  confusion  matrix  of  Example  1.  Note  that  the  largest 
off-diagonal  entry  in  the  confusion  matrix  corresponds  to  the  largest  peak  in  the  risk 
p(X),  and  the  second  largest  off-diagonal  entry  corresponds  to  the  second  largest 
peak. 


5.3  Example  3:  Small  size  training  set,  T/ 100 


Example  3  trains  on  the  set  T / 100,  a  fixed  (randomly  selected)  subset  of  Z/10  with 
20,  7,  and  5  training  samples  representing  classes  1,  2,  and  3,  respectively.  The 
number  of  components  per  class  are  further  reduced  to  G \  =  2,  Gj  —  1,  and  G3  =  1. 
The  trained  mixing  proportions  and  mean  vectors  of  the  class  mixture  PDFs  are 
listed  in  Table  5.  The  inverse  of  the  trained  covariance  matrix  is 


E'1  =  lO-3  x 


11.230  4.9816 
4.9816  8.8180 


19 


-321- 


Table  3:  Mixing  Proportions  and  Mean  Vectors  for  Example  2 


Input  Class 

Sample  Size 

Decision 

Class  1 

Class  2 

Class  3 

Input 

Training 

Class  1 

97.19% 

(1905) 

1.53% 

(11) 

0.00% 

(0) 

1960 

196 

Class  2 

1.79% 

(35) 

92.64% 

(667) 

1.80% 

(9) 

720 

72 

Class  3 

1.02% 

(20) 

5.83% 

(42) 

98.20% 

(491) 

500 

50 

Table  4:  Confusion  Matrix  for  Example  2 


20 


-322- 


The  eigenvalues  of  E  are  66.010  and  204.149,  so  E  is  numerically  stable. 

The  discriminant  boundary  and  decision  risk  are  depicted  in  Figure  10.  The 
three  arms  of  the  discriminant  boundary  are  nearly  linear  in  this  example  because 
of  the  small  model  orders  { G j } .  The  class  decision  regions  are  unbounded  in  this 
example.  The  class  3  region  is  bounded  in  the  other  two  examples.  The  risk  function 
p(X)  has  only  one  peak,  and  it  lies  in  the  same  location  as  that  of  Example  1.  The 
maximum  decision  risk  is  8.72  x  10-5. 

The  decision  assurance  in  dB//max  is  depicted  in  Figure  11.  The  maximum 
assurance  is  4.57  x  10"4.  The  covariance  matrix  structure,  clearly  visible  in  Figure 
11,  shows  that  the  covariance  matrix  E  is  significantly  rotated  from  that  of  the 
other  two  examples.  The  reason  is  the  small  model  order  for  class  1.  It  happens 
during  training  that  one  Gaussian  models  the  most  densely  clustered  samples,  and 
the  other  models  the  most  significant  portion  of  the  remaining  samples  in  class  1. 
Since  class  1  samples  dominate  the  within-class  covariance  matrix  calculation  and 
the  densely  clustered  samples  dominate  (cf.  the  mixing  proportions  of  Table  5)  in 
class  1,  the  covariance  matrix  E  reflects  the  dense  portion  of  class  1  samples.  This 
effect  is  evident  in  Figure  11. 

Classification  performance  was  estimated  on  the  sample  set  T  that  was  used  as 
the  training  set  for  Example  1.  The  confusion  matrix  is  given  in  Table  6.  Note 


Class  1 

Class  2 

Class  3 

component 

number 

V 

b 

T 

b 

7T 

b 

1 

0.757665 

106.663 

71.176 

1.00000 

61.286 

15.857 

1.00000 

48.823 

49.063 

2 

■ 

106.816 

109.032 

Table  5:  Mixing  Proportions  and  Mean  Vectors  for  Example  3 


21 


-323- 


that  class  3  is  never  misclassified  as  class  1.  Correct  classification  rates  are  very 
similar  to  those  of  Example  2.  The  largest  off-diagonal  entry  in  the  confusion  matrix 
corresponds  to  the  peak  of  p(X),  just  as  in  the  other  examples.  Clearly,  GF  training 
on  the  greatly  reduced  size  training  set  T /100  seems  to  be  nearly  as  good  as  that 
attained  by  using  the  entire  training  set  T. 


Input  Class 

Sample  Size 

Decision 

Class  1 

Class  2 

Class  3 

Input 

Training 

Class  I 

97.65% 

(1914) 

1.11% 

(8) 

0.00% 

(0) 

1960 

20 

Class  2 

0.46% 

(9) 

91.81% 

(661) 

0.80% 

(4) 

720 

7 

Class  3 

1.89% 

(37) 

7.08% 

(51) 

99.20% 

(496) 

500 

5 

Table  6:  Confusion  Matrix  for  Example  3 


6  Tolerance  Intervals  for  Assessing  Classification 
Performance 

Example  3  is  one  realization  of  a  larger  “experiment”  conducted  by  randomly  se¬ 
lecting  subsets  of  specified  size  from  the  training  set  T.  In  this  section  we  assess 
quantitatively  how  representative  this  example  was  of  the  larger  experiment  by  using 
a  nonparametric  statistical  method  known  as  tolerance  intervals.  Tolerance  intervals 
are  similar  to  confidence  intervals  in  their  use,  but  they  are  defined  and  derived  very 
differently. 

We  define  a  training  trial  Zona  fixed  size  subset  of  a  given  labelled  training  set 
T  in  the  following  manner.  Firstly,  a  (uniform)  random  sample  S  of  the  specified 

22 


-324- 


size  is  drawn  from  T  without  replacement.  The  subset  S  is  returned  to  T  before  the 
next  training  trial  begins.  Next,  GF  training  is  conducted  on  the  set  S.  The  initial 
parameter  set  A  required  by  GF  training  may  be  fixed  or  generated  randomly,  but 
the  same  initialization  procedure  must  be  used  in  all  training  trials.  On  convergence 
of  the  GF  training  algorithm,  classification  performance  is  assessed  on  the  set  T\S. 
In  the  examples  presented  in  Section  V,  classification  performance  is  measured  by  a 
confusion  matrix.  In  this  Section  we  also  consider  the  total  error  rate. 

Training  trials  are  independent  trials  of  a  multivariate  discrete  random  variable. 
The  trials  are  independent  because  of  the  independence  of  the  subsets  S  drawn 
from  T,  and  the  trials  are  discrete  outcome  because  there  are  only  a  finite  number 
of  different  possible  subsets  S  that  can  be  drawn  from  T.  Finally,  the  trials  are 
multivariate  because  the  outcome  is  the  calculated  confusion  matrix  together  with 
the  total  error  rate.  In  principle,  the  PDF  of  the  training  trials  can  be  found  exactly 
by  systematically  running  through  the  entire  list  of  all  possible  subsets  S  of  the 
training  set  T ;  however,  except  for  very  small  training  sets  T,  such  a  procedure  is 
computationally  prohibitive. 

Suppose  momentarily  that  the  total  error  rate,  denoted  by  e,  is  the  only  outcome 
of  a  training  trial  and  that  its  PDF  is  continuous,  not  discrete.  Denote  the  PDF  of  c 
by  E(e),  and  let  n  independent  training  trials  with  outcomes  ej,  e2, . . . ,  e„  be  given. 
Then  the  population  fraction  or  coverage,  u,  of  the  PDF  E(e)  between  min  e*  and 
max  cjt  is  given  exactly  by 

rmMXtk 

u—f  E(z)dz. 

•/mine* 

Wilks  shows  [10]  that  the  PDF  of  u,  denoted  P„(u),  is  independent  of  the  PDF  E(e) 
and  is  equal  to 

■Pn(u)  =  n(n  -  l)u"-2(l  -  u). 

Robbins  shows  [11]  that  order  statistics  are  the  only  statistics  which  yield  distri¬ 
bution  free  tolerance  intervals.  If  we  want  coverage  u  >  100/5%  with  probability 

23 


-325- 


100a%,  then  n  must  be  chosen  so  that 

Pr{u  >0}  =  ct  =  f  Pn(u)du.  (11) 

J/} 

The  PDF  of  the  total  error  rate  for  a  training  trial  is  discrete.  The  result  by  Wilks 
is  applicable  to  univariate  discrete  PDFs,  but  (11)  changes  to  Pr{u  >  /3}  >  a  in 
this  case.  For  a  proof  of  Wilks’  result  for  discrete  PDFs,  see  [12]. 

In  general,  a  training  trial  outcome  includes  the  confusion  matrix  and  total  er¬ 
ror  rate.  The  PDF  of  the  confusion  matrix  is  discrete  and  multivariate.  Wald  [13] 
derives  distribution  free  tolerance  intervals  for  continous  multivariate  outcomes  by 
computing  order  statistics  on  each  vector  component  separately  and  by  carefully 
choosing  a  multidimensional  interval  (block).  Tukey  extends  Wald’s  results  to  more 
general  choices  of  multidimensional  blocks  [14]  and  to  discrete  multivariate  PDFs 
[15].  The  curves  given  in  [16]  are  valid  for  continuous  multivariate  PDFs.  These 
curves  may  be  used  to  obtain  lower  bounds  on  the  confidence  a  for  discrete  multi¬ 
variate  PDFs  [15],  just  as  in  the  univariate  case. 

From  the  curves  in  [16],  taking  n  equal  to  50  ensures  coverage  90%  =  100/3%  and 
confidence  95%  =  100a%.  Results  for  n  =  50  independent  trials  of  the  experiments 
T /10  and  T /100  are  shown  in  Table  7  and  in  Table  8,  respectively.  The  corre¬ 
sponding  tolerance  intervals  for  the  total  error  rate  in  percentages  are  3.6  ±  1.1  and 
8.4±5.3  for  T /10  and  T /100,  respectively.  The  tolerance  intervals  for  the  confusion 
matrices  were  obtained  using  Wald’s  method  [13]  for  selecting  the  multidimensional 
block.  Table  7  shows  that  GF  training  yields  good  class  characterization  when  a 
T/10  training  set  is  used.  The  largest  tolerance  intervals  are  for  class  2  because 
class  2  data  overlaps  both  class  1  and  class  3  data.  Table  8  shows  that  GF  training 
does  not  perform  as  well  on  T/ 100  training  sets.  In  particular,  large  performance 
variations  are  possible  when  trying  to  distinguish  class  2  data  from  class  3  data. 
Table  8  clearly  shows  that  the  excellent  results  obtained  in  Example  3  are  at  the 
high  end  of  the  tolerance  interval  (block)  for  the  experiment  T/ 100. 

24 


-326- 


7  Concluding  Remarks 


The  examples  of  Sections  V  and  VI  give  convincing  evidence  of  the  utility  of  GF 
training.  Although  the  classes  in  these  examples  are  nearly  linearly  separable  (cf. 
Figure  10  and  Table  6),  the  important  translation  property  discussed  in  Section  IV 
implies  that  linear  separability  is  not  the  central  issue  for  GF  training  because  we 
can  always  train  on  widely  separated  translates  of  the  individual  class  training  sets 
T.  The  examples  show  that  GF  training  has  obtained  very  reasonable  estimates  of 
the  class  conditional  PDFs,  that  is,  the  available  class  training  samples  have  been 
generalized  in  some  sense.  Highly  nonlinear  discriminants  are  by-products  of  good 
class  conditional  PDF  estimates. 

A  “decision  directed”  generalized  Fisher  (DDGF)  method  can  be  used,  if  desired, 
to  obtain  a  strictly  piecewise  linear  approximation  to  the  GF  discriminant  surface. 
The  DDGF  method  classifies  an  input  vector  X  into  the  class  j*  such  that  = 

arg  max{aJ  jr,jp,j(A')}.  The  component  decision  im  is  also  part  of  the  DDGF  decision. 
If  the  required  maximum  is  taken  first  over  the  component  index  *,  and  then  over 
the  class  index  j,  it  is  seen  that  the  DDGF  method  is  equivalent  to  a  two  stage 
decision  and  is  implementable  in  a  feed-forward  NN  structure  that  avoids  using 
exponential  nonlinearities.  In  the  first  stage,  a  “within-class”  decision  determines 
which  component  generated  the  given  input  vector.  There  are  as  many  within-class 
decisions  as  there  are  classes.  In  the  second  stage,  a  final  “between-class”  decision 
is  made  using  the  representative  class  components  determined  by  the  first  stage. 


Decision  (%) 

Class  I 

Class  2 

Class  3 

Class  1 

97.1  ±  1.2 

1.2  ±0.6 

0.0  ±  0.0 

Class  2 

1.5  ±0.7 

92.1  ±2.5 

1.2  ±1.2 

Class  3 

1.4  ±  0.9 

6.6  ±2.2 

98.8  ±1.2 

Table  7:  Tolerance  Intervals  for  T j  10  Experiment 


25 


-327- 


Both  the  within-class  and  between-class  decisions  have  piecewise  linear  discriminants 
because  of  the  Gaussian  PDF  structures  in  each  instance.  The  within-class  mixing 
proportions  {fly}  are  the  a  priori  probabilities  for  the  within-class  decisions,  while 
the  between-class  mixing  proportions  {aj}  are  used  for  the  between-class  decision. 

An  interesting  aspect  of  GF  training  is  that,  because  only  one  covariance  matrix 
E  is  used  across  all  classes,  the  principal  components  analysis  (PC A)  based  on  E  is 
common  to  all  M  classes.  A  common  PCA,  taken  together  with  the  spread-of-the- 
means  of  the  components,  are  potentially  useful  tools  for  investigating  dimensional 
reduction  of  the  feature  space  in  all  classes  simultaneously.  This  unique  aspect  of 
GF  training  merits  further  study. 

The  GF  training  algorithm  derived  in  the  Appendix  easily  accommodates  sev¬ 
eral  useful  extensions.  Three  extensions  are  mentioned  here.  Supervised  /  unsuper¬ 
vised  GF  training  can  be  undertaken  on  training  sets  in  which  some  of  the  training 
samples  are  unlabelled.  Unsupervised  GF  training  is  the  special  case  of  all  unla¬ 
belled  training  data.  This  extension  is  especially  useful  for  applications  in  which  the 
cost  or  difficulty  of  correctly  labelling  all  the  training  samples  is  prohibitive.  GF 
training  iterations  can  be  made  adaptive  and  run  “closed  loop"  if  the  class  PDFs 
are  time-varying.  This  extension  requires  reformulating  the  GF  likelihood  function 
with  Bayesian  prior  distributions  [8]  for  the  mixing  proportions,  mean  vectors,  and 
covariance  matrix  of  the  class  conditional  mixture  PDFs.  Adaptive  GF  training  is 
potentially  useful  in  applications  in  which  class  statistics  are  either  non-stationary  or 
are  treated  as  non-stationary  to  ensure  robustness  and  a  degree  of  fault  tolerance. 


Decision  (%) 

Class  1 

Class  2 

Class  3 

Class  1 

96.0  ±  3.0 

5.8  ±  5.0 

mm 

Class  2 

1.4  ±1.4 

82.8  ±  10. 

5.2  ±  5.2 

Class  3 

2.5  ±1.6 

21.3  ±16. 

97.5  ±  2.3 

Table  8:  Tolerance  Intervals  for  T/ 100  Experiment 


26 


-328- 


GF  training  can  be  extended  to  mixtures  of  discrete  PDFs  and  continuous  non- 
Gaussian  PDFs  [7].  These  extensions  may  enable  reduced  PNN  size  (because  of  the 
increased  modelling  efficiency)  in  applications  requiring  discrete  feature  vectors  or 
continuous  non-Gaussian  feature  vectors.  These  extensions  are  not  mutually  exclu¬ 
sive.  For  example,  adaptive  GF  training  is  possible  with  supervised  /  unsupervised 
training  sets  T. 


27 


-329- 


A  Appendix:  Derivation  of  the  GF  Training  Al¬ 
gorithm 


The  Generalized  Fisher  (GF)  training  algorithm  is  based  on  the  Expectation  -  Max¬ 
imization  (EM)  method  described  in  reference  [7].  The  EM  method  consists  of  two 
steps:  The  first  step  is  called  the  expectation  step  or  E-step,  and  the  second  is  called 
the  maximization  step  or  M-step.  The  E-step  extends  the  likelihood  function  C  to 
the  unobserved  or  “missing”  data,  and  then  computes  an  expectation  of  over  the 
missing  data  to  obtain  an  auxiliary  function  Q.  The  M-step  maximizes  the  func¬ 
tion  Q  with  respect  to  the  parameter  set  to  be  estimated.  Reference  [7]  describes 
the  conditions  required  for  the  EM  method  to  converge  to  a  local  maximum  of  the 
likelihood  function  C. 

Suppose  independent  samples  of  a  random  vector  X  with  dimension  N  are  ob¬ 
served,  where  X  has  a  mixture  of  mixtures  conditional  PDF  given  by 

X  ~  (12) 


M 


3=1 


G, 

«=1 


1, 

(13) 

Gj 

1=1 

(14) 

G’*i  r  1  1 

^  (2ir)^/2|E|1/2  CXp  [ 

(15) 

1, 

(16) 

and  the  symbol  X*  denotes  the  transpose  of  X.  The  number  of  components  G j 
can  be  different  for  different  class  mixtures.  Note  that  the  between-class  mixing 
proportions  {c^}  and  the  within-class  mixing  proportions  {*•„}  are  contained  in  the 
interval  [0,  Ij. 


28 


-330- 


The  parameter  sets 


Ao 

(17) 

Ai 

=  {  rij  l  A,j  }  ,  J1 

(18) 

A 

=  {°3,*3}£l 

(19) 

are  unknown.  In  this  appendix  the  GF  training  algorithm  is  derived  for  estimating 
the  unknown  parameter  set  A  from  the  training  set  T,  and  it  is  based  on  the  EM 
method.  The  following  discussion  will  be  devoted  to  developing  the  expectation 
step  (El-step)  and  the  maximization  step  (M-step)  of  the  EM  method  applied  to  the 
mixture  of  mixtures  PDF  model  in  equation  (12).  The  training  set  T  is  partitioned 
(labelled)  so  that  for  each  component  £,-(.Y|Aj)  of  the  mixture  }(X |A),  T,  of  the 
observations  of  X  are  from  jj(X|Aj)  : 

t  =  {*.)L>  =  (20) 


where 


M 
3=  1 


The  posterior  likelihood  function  for  the  unlabelled  training  set  T  is 

T 

Ti  -Tm  I  »=i  \i= i 


CU{T\\)  = 


ft  fE^(ATn|A,)j 


(21) 


(22) 


using  equation  (12)  and  the  independence  of  the  training  samples.  The  multinomial 
coefficient  is  required  in  equation  (22)  because  the  training  set  T  is  unordered.  Since 
the  multinomial  coefficient  is  a  constant  and  only  scales  the  likelihood  function,  it 
is  dropped  for  the  rest  of  this  discussion.  The  likelihood  function  of  the  labelled 
training  set  T  becomes 

wia)  =  nnfE^i^iA, 

j=l  k=l  \/=l 
M 

=  nn<wi(x.A)  <23) 

j= 1  k=l 


29 


-331- 


where  £(•)  is  the  Kronecker  delta  function.  Substituting  equation  (14)  in  equation 
(23)  yields 

am «nri«i  E *nPii(XkA*n)  •  (24) 

j=i  *=  i  v=i  / 

If  a,  =  0  for  some  j,  then  C{T\X)  —  0.  Therefore,  we  require  that  a,  >  0,  j  = 

The  missing  data  in  this  problem  is  the  index,  t,  of  the  PDF  p,j(-|A,j)  within  the 
mixture  </j(-)  from  which  Xfcj  originated.  The  “complete  data”  in  this  case  is  the 
set 

r  =  {«,}£.,}",  =  -tui,  (25) 

where  ikj  denotes  the  component  index  of  the  PDF  from  which  Xk3  was  drawn.  Note 
that  ikj  is  not  observed  and  that  1  <  <  G;.  The  conditional  PDF  for  T  is 

w‘\\)  =  nn«,9;(n,iA;) 

j=i  jt=i 

=  n  ii  ai  (e  ) 

i= l  fc=i  \i-\  ) 

M  T  /  G  \ 

=  n  n  \T'*m(xkj\*u)w  -  Ui)l 

7=1  *=1  \/=l  / 

=  nnw«i(4iAy)  (26) 

7=i  *=i  l=ikj 

where  S(-)  is  the  Kronecker  delta  function.  The  PDF  of  I  =  {:*_,}  conditioned  on  T 
and  A  is  then 


£(X|T,A)  = 

Substituting  (23)  and  (26)  into  K  yields 


C(T\\)  • 


ac(ht,a)  =  fi  n 

7=1  *=1  9j\-*kj\Aj) 

M  x 

=  nil  ■*<*«) 

j*1  i=.. 


-332- 


where 


«*(*«)  = 


Ty  exp  [-i(x  -  nhyj:-l{x  -  n,j)] 
S *.;exp  [-i(AT  -  /iy)‘E-»(*  -  *,)] 


*=*», 


(30) 


Note  that  wij(Xtj)  >  0,  and  tvij(Xkj)  =  0  if  and  only  if  *y  =  0.  It  is  straight 


forward  to  verify  that 

E£(I|T,  A)  =  l, 

T 

(31) 

where  the  sum  over  X  is 

shorthand  for  the  T-fold  sum 

G, 

Gj  Ga  Gj 

Gu 

Gm 

E=  E 

•  E  E  •  E  — 

E  ■ 

••  E 

(32) 

i  *u=i 

•r,i=l  »ia=l  *r2a=l 

•urf=i 

'-TUU=1 

Similarly, 

E  £(I|T,A)  =  v>ij{Xkj) 

(33) 

A'kj 

l=ikj 

where  the  sum  over  X  \  *jy  is  the  same  as  the  sum  over  X  except  that  the  sum  over 
the  index  tjy  is  deleted.  Note  that  AC(-)  defines  a  probability  on  the  discrete  space 
of  indices  X. 


The  E-step  of  the  EM  method  is  defined  to  be 


Q(A|V)  =  i;{log^(r|A)|T,A'},  (34) 


where  the  expectation  is  taken  over  the  set  of  all  possible  indices  I  =  t'/y  and  is 
conditioned  on  T  and  A',  where  A'  is  a  given  parameter  set  of  the  form  (19).  By 
definition  of  expectation, 


Q(A|A')  =  £log[jF(T'|A)]A:(I|T,A) 


=  HY.il  log  [<*j*UPii(Xkj  |Ay)l 
x  j=i  t=i 


AC(I|T,A') 


/=; 


•"J 


X  >=1*=1 

+  E  E  E  loe  )1 

X  j=l*=l 


AC(T|T,V). 


<=•*, 


(35) 

(36) 


(37) 


31 


-333- 


The  first  term  simplifies  easily  using  equation  (31)  because 

EEJiPogasWUT.AO  =  (EJJIogaJECtZir.AO 

r  j=i  \j=i  /  x 

=  ^Tjlogotj. 

7=1 

Using  equation  (33),  the  second  term  in  equation  (37)  simplifies  to 


£  £  £  loglW^IAo)]  /C(I|T,  AO 

I  >=!*=>  te.*, 

=  £  £  £  log[W(^|Ay)]  £  AC(JjT,  AO 

i=l  *=1  /=!  l=ik,I\ikJ 

=  £££l°gKPy(^ilAy)K(^i)  , 

j=l  *=1  i=1  ,  .  _ 


where 


wij{Xkj)  -  — g-  ;  f  TTv  /  wva-u  v  TTT  •  (  • 

E,=i  Kj  exp  |-±(*  -  ^)‘(S0  X{X  -  n'ij i)j  ^ 

Therefore,  from  equations  (37),  (39)  and  (41) 

Q(A|Y)  =  £t,  logoff  EE  log  toM-'Wi)]  <»«(*»,)•  (43) 

7=1  7=1  «=1  k=l 

Note  that  if  ir,j  =  0  for  some  i  and  j,  then  Q(A|A0  =  —  oo. 

The  M-step  of  the  EM  method  is  achieved  by  maximizing  Q(A|A/)  with  respect 
to  the  parameter  set  A  given  the  previous  estimate  A'.  In  [17],  Juang  proves  that 
maximizing  Q  is  equivalent  to  maximizing  £;  hence,  an  iterative  procedure  for  max¬ 
imizing  Q  over  the  parameter  set  A  will  cause  the  likelihood  function  C  to  mono- 
tonically  increase.  This  maximization  problem  is  solved  either  by  differentiating  Q 
with  respect  to  each  parameter  in  the  set  A,  setting  the  resulting  partial  derivatives 
equal  to  zero,  and  then  solving  for  each  parameter,  or  by  the  method  of  Lagrange 
multipliers  if  parameter  constraints  are  necessary.  In  the  following  development,  it 


-334- 


turns  out  that  the  parameters  {a,x,p}  are  uncoupled  from  each  other  and  from  E 
in  the  equations  defining  the  necessary  conditions  for  maximization,  so  they  can  be 
solved  for  separately.  The  parameter  E  is  a  function  of  the  parameters  {/i,y}. 

The  expression  for  Q  in  equation  (43)  may  be  rewritten  as 

QW)  =  E  Tj  log  a,  +  EEEW  log 

j= 1  3=\  i'=l  *=1 

+  EEE  «Xu)  (44) 

j=  1  1=1  k= 1 

=  Qa  +  ZQi  +  Qv  (45) 

i=i 


Note  that  Qq  depends  only  on  {cr,},  that  Q£.  depends  only  on  {x,y,i  =  l,...,Gy}, 
and  that  Qp  depends  jointly  on  E  and  the  vectors  i  =  1  ,...,G3,j  =  1, . . . ,  M}. 
In  addition,  to  maximize  for  each  j,  we  require  that  x,y  >  0  for  all  i.  Consequently, 
if  Q0,  QJ.  and  Qp  are  each  maximized  separately,  then  Q  is  also  maximized  (see  Juang 
[17]  for  the  special  case  of  M  =  1). 

Starting  with  the  parameter  set  {aJ5 },  Qa  is  maximized  subject  to  constraint 
(13).  The  appropriate  Lagrangian  for  {ay}  is 

Q  =  Qo  +  7  ^E  ai  ~  1 

where  7  is  the  Lagrange  multiplier.  Differentiating  with  respect  to  a3  yields 


dQ 

da3 


T 

—  +  7  =  °; 
ai 


(47) 


hence, 


T} 

a3  =  — 
7 


Substituting  into  the  constraint  (13)  yields 

M  r  1  M 

l  =  -^ii  =  -I^T;  = 

7  7,tt 


T 

7’ 


(48) 


(49) 


33 


335- 


Therefore,  7  =  —T,  and 


&i  =  J-  (50) 

Note  that  the  estimates  {dj}  are  an  immediate  consequence  of  the  labelling  (par¬ 
titioning)  of  the  training  set  Tand  that  dj  >  0,  as  required.  By  Lemma  2  in  [10], 
{dj}  is  the  unique  global  maximum  of  Qa. 

To  estimate  { 7Ttj ,  i  =  1, . . . ,  Gj},  the  term  Qi  is  maximized  subject  to  the  con¬ 
straint  (16).  In  this  case,  the  appropriate  Lagrangian  is 


Q  =  Qi  +  7 


(I"-)- 


where  7  is  the  Lagrange  multiplier.  Talcing  the  partial  derivative  with  respect  to  7r,j 
yields 

g-l-M*,}  (£+*)-.  m 

or 

T 

~  ^2  Wij(Xkj).  (53) 

>  k-l 

Substituting  (53)  into  constraint  equation  (16)  results  in 

=  (54) 

1  »=i  fc=i 


XXi(**i)*l,  (55) 

t=i 

it  follows  that  7  =  — T ).  Hence,  the  estimate  of  the  mixing  proportion  is,  from 
equation  (53), 

T 

<(*«)•  (56) 

1i  *=  1 

Note  that  i \j  >  0,  as  required.  Again,  by  Lemma  2  of  [10],  the  estimate  {x,j}  is  the 
unique  global  maximum  of  Q£. 


-336- 


The  new  estimate  of  the  covariance  matrix  £  is  found  by  differentiating  Qp  with 
respect  to  E  =  [Ey]: 

Ve  q,  =  EEE  [-E-  +  E'V*,  -  «)(**  - 

j=l  »=1  fcs  1  L 

=  0,  (57) 

where  Ve  is  defined  as  the  matrix  operator 


Ve  = 


From  equation  (42), 


M  T,  Gj  M  T,  M 

EEE«y*»>)  =  EEi  =  Ery  =  r,  (59) 

j- 1 k-i t=i  j=i k= 1  j=i 

and  the  estimate  of  the  covariance  matrix  is  therefore 

s  =  ?EEE  *>b(Xki)(xki  -  w){Xki  -  *#)*•  (60) 

1  ]=1  i=l  *=1 

Note  that  E  is  in  the  convex  hull  of  outer  products  of  vectors,  and  that  the  mean 
vectors  /Zy  in  equation  (60)  are  the  maximum  likelihood  estimates  of  the  true  means. 
The  estimate  of  the  mean  vector  /z,j  is  also  found  by  differentiating  Qp: 


-  E  -  «j)  =  0.  (61) 

fc=l 

Note  that  in  this  case,  jZy  is  defined  as  a  vector  of  length  N,  and  that  V„  is  defined 
for  general  vectors  /z  =  (/zj, . . . ,  /z#)  as 


v,,“ 

Hence,  from  equation  (61)  the  estimate  of  the  mean  vector  /z,j  is 

.  EL, 

P'i  ~  \-'T,  r  (Y  \  ‘ 

E*=i  *ij{Xkj) 


-337- 


Note  that  the  estimate  /i,j  is  in  the  convex  hull  of  the  training  set  for  label  class  j, 
and  that  fa  is  independent  of  the  covariance  matrix  estimate  E. 

Equations  (60)  and  (63)  produce  the  unique  global  maximum  of  Qp,  provided 
a  reqularity  condition  is  imposed  on  the  training  set  T.  For  vectors  Cj  €  RN, 
j  =  1, . . . ,  M,  define  the  set 

T(c1,...,cM)  =  'H  (J  {Xk]  +  CRn,  (64) 

>= i 

where  H[-]  denotes  the  closed  convex  hull.  The  training  set  T  is  defined  to  be  “full 
rank”  if  the  set  T(ci,. . .  ,cm)  has  at  least  N  +  1  extreme  points  for  every  choice 
of  the  vectors  {cj}.  The  full  rank  assumption  on  T  guarantees  that  the  estimated 
covariance  matrix  E  given  by  equation  (60)  is  positive  definite. 

The  following  theorem  proves  that  iteration  step  of  the  EM  method  is  well  de¬ 
fined  for  GF  training.  The  theorem  does  not  imply  global  maximum  likelihood 
convergence  of  the  GF  training  algorithm. 

Theorem  If  the  training  set  T  is  full  rank,  and  if  the  mixing  proportions  7r,j  ^  0 
for  all  i  and  j,  then  Qp  has  a  unique  global  maximum  as  a  function  of  the 
covariance  matrix  E  and  the  mean  vectors  {pij,i  =  1  =  1,. . . ,M}. 

Before  proving  this  theorem,  note  that  the  full  rank  requirement  on  T  is  not  very 
restrictive.  For  example,  if  any  one  class  has  N  +  1  training  samples,  and  after 
arbitrary  translation,  any  N  of  the  translated  samples  are  linearly  independent, 
then  the  pooled  training  set  T  is  full  rank.  However,  the  set  T  can  also  have  full 
rank  even  if  none  of  the  individual  classes  in  the  pool  have  full  rank.  This  is  an 
important  consequence  of  pooling  across  classes.  For  instance,  if  M  =  N  and  each 
class  has  two  samples,  that  is,  Tj  =  {A^,  A^},  then  the  set  T  is  full  rank  if  and 
only  if  the  vectors  {X\j  —  Xjj}^  span  RN.  Also,  note  that  class  training  sets  that 
contain  only  one  sample  do  not  contribute  to  the  rank  of  T.  Clearly,  the  geometric 
condition  on  the  convex  hull  embodies  many  equivalent  algebraic  statements. 

36 


-338- 


The  following  proof  of  the  theorem  is  in  the  spirit  of  Liporace’s  proof  of  a  similar 
result  [18,  Appendix].  From  the  definition  of  Qp  in  (45)  and  pij(X\\i  i )  in  (15), 

Qp  =  \  £  12  £  *4,(**j)  [-N  log(2ir)  +  log  |A|  -  ( Xki  -  /i*;)‘ A{Xkj  -  pkj)] . 

L  j=l  is  1  ks  1 

(65) 

where  A  =  E_1.  Parameterizing  Qp  in  terms  of  the  precision  matrix  A,  instead  of 
the  covariance  matrix  E,  allows  the  development  of  an  explicitly  negative  expression 
for  the  second  derivative  of  Qp  at  a  critical  point.  Let  the  point  A  =  {A,/i,j,t  = 
1, . . . ,  Gj,  j  =  1, . . .  ,M)  satisfy  the  necessary  conditions  (60)  and  (63)  for  a  critical 
point  of  Qp.  Expressing  A  as  a  convex  combination  of  two  arbitrary  points  A1  and 
A2  in  the  domain  of  Qp  such  that  A1  ^  A2,  A1  ^  A  and  A2  ^  A  yields 

Pi:  =  6 Pi,  +  (1  ~  6)P%  (66) 


and 

A  =  6  A1  +  (1  -  9) A2. 


(67) 


Note  that  9  in  (66)  and  (67)  is  uniquely  defined,  is  independent  of  the  indices  i  and 
j,  and  satisfies  0  <  6  <  1.  Substituting  (66)  and  (67)  into  (65)  and  differentiating 
twice  with  respect  to  9  yields  (after  tedious  calculation) 

fflO  1  M  Gj  Tj 

^  =  »«<*«)  («, + *?*) ,  (68) 

where 


R  -  Iff  £L ~ D*  I2 

2  h  W  +  (!  -  w\  ’ 

R\:  =  2(^  -  4)‘A(/4j  -  p}j), 

R\:k  =  4(pV  -  ^)‘A(Xfcj  -  9^  -(l-*)^), 


(69) 

(70) 

(71) 


D}  and  D?  are  the  diagonal  entries  of  U'A’U  and  U‘A2U  respectively,  and  where  U 
is  a  nonsingular  matrix  diagonalizing  A1  and  A2  simultaneously.  Note  that  R  >  0, 


37 


-339- 


and  that  R  =  0  if  and  only  if  A1  =  A2.  (N.B.  Had  we  not  reparameterized  in  terms 
of  A,  this  term  would  be  nonpositive).  Because  the  training  set  T  is  full  rank,  A  is 
positive  definite;  thus,  the  term  >  0,  and  =  0  if  and  only  if  =  jt?  for 
some  i  and  j.  At  least  one  of  the  terms  {/?,  w-j(Xkj)Jt}j}  is  strictly  positive  because 
A1  ^  A2  and  because  ji  0  for  all  i  and  j  (this  implies  w'}(Xkj)  >  0).  The  term 
Rjjk  does  not  vanish,  but  the  sum  over  its  indices  does  vanish;  that  is, 


j=i  i=i  a— i 

=  -  4)'A  E  -  *4  -  (i  -  «K) 

j= 1  i=l  i=l 

=  0. 


(72) 


because  of  the  necessary  condition  (61)  at  a  critical  point  (recall  the  definition  of 
Hij  at  a  critical  point  from  equation  (66)).  It  follows  that 


d2Q, 
d0 2 


<0. 


(73) 


Hence,  a  critical  point  of  Qp  is  a  local  maximum.  Since  Qp  has  a  unique  critical 
point,  all  that  remains  to  be  shown  is  that  Qp  attains  its  maximum.  Because  the 
training  set  T  is  full  rank,  Qp  is  bounded  above.  But  Qp  — ►  —  oo  uniformly  as  its 
defining  parameter  vector  goes  to  the  point  at  infinity,  so  the  supremum  of  Qp  must 
be  a  maximum  (i.e.,  Qp  attains  its  maximum). 

To  summarize  the  GF  iteration,  first  note  that  the  mixing  proportions  {a^}  may 
be  computed  at  the  beginning  from  equation  (50): 


_Tj 


Ctj  =  —r. 
J  rp 


(74) 


Now  let  A*n)  be  available  from  the  previous  iteration,  and  define  the  weights 
v  ,  -  pi;1)] 


= 


(75) 


38 


-340- 


The  new  intercomponent  mixing  proportions  are  updated  using  equation  (56): 


*=i 

Since  =  0  if  and  only  if  ffj”*  =  0,  GF  training  can  not  be  initialized  with  any 
zero  mixing  proportions.  Specifically,  we  require  that  ^  0  for  all  i  and  j.  The 
mean  vectors  are  updated  using  equation  (63): 

“  eL*!"1!**,) 

The  new  covariance  matrix  is  found  from  equation  (60): 

s(n+I)  =  ^EEE  -  AT'Hx*  -  "*")'■  m 

1  j=i  ,=i *=i 

Convergence  of  the  GF  algorithm  can  be  tested  in  two  ways.  First,  <3(A*n+1*|A(n)) 
can  be  computed  using  equation  (43): 

Q(A<n+1>|A<">)  =  'ZTjloga,  -  log  ((2*)^2|E<"+1>|1/2)  (79) 

i=i 

+  EE  E  «£’(*«>  flog1!;”’  -  hxh  -  -  ,■!-«>)} . 

i= l  »=l  t=i  *•  *  1 


Then  Q(Afn+1^  |A<”^ )  can  be  compared  to  Q(A<n>|A*n-1))  to  determine  if  the  parameter 
estimates  have  stabilized.  If  the  estimates  have  stabilized,  then  the  algorithm  is 
terminated.  Alternatively,  GF  training  can  be  terminated  if  the  likelihood  function 
C(T\X ("))  as  a  function  of  n  ceases  to  increase  at  a  sufficient  rate. 


References 


[1]  S.  J.  Press,  Applied  Multivariate  Analysis.  Malabar,  Florida:  Kreiger  Publish¬ 
ing,  1982. 

[2]  R.  0.  Duda  and  P.  E.  Hart,  Pattern  Classification  and  Scene  Analysis.  New 
York:  Wiley- Interscience,  1973. 

[3]  R.  L.  Streit,  “A  neural  network  for  optimum  Neyman-Pearson  classification,”  in 
Proceedings  of  the  International  Joint  Conference  on  Neural  Networks ,  pp.  685- 
690,  June  1990.  San  Diego,  California,  Volume  I. 

[4]  R.  P.  Lippmann,  “An  introduction  to  computing  with  neural  nets,”  IEEE  ASSP 
Magazine,  vol.  4,  pp.  4-22,  April  1987. 

[5]  D.  F.  Specht,  “Probabilistic  neural  networks  for  classification,  mapping,  or 
associative  memory,”  in  Proceedings  of  the  IEEE  International  Conference  on 
Neural  Networks,  pp.  525-532,  July  1988.  Volume  I. 

[6]  E.  Parzen,  “On  estimation  of  a  probability  density  function,”  Annals  of  Math¬ 
ematical  Statistics,  vol.  33,  pp.  1065-1076,  1962. 

[7]  A.  P.  Dempster,  N.  M.  Laird,  and  D.  B.  Rubin,  “Maximum  likelihood  from 
incomplete  data  via  the  EM  algorithm,”  Journal  of  the  Royal  Statistical  Society, 
Series  B,  vol.  39,  pp.  1-38,  1977.  (Article  includes  discussion.). 

[8]  R.  A.  Redner,  R.  J.  Hathaway,  and  J.  C.  Bezdek,  “Estimating  the  parameters 
of  mixture  models  with  modal  estimators,”  Communications  in  Statistics,  Part 
A:  Theory  and  Methods,  vol.  16,  pp.  2639-2660,  1987. 

[9]  R.  L.  Streit,  “Class  priors  for  entropy  maximization,”  Technical  Memoran¬ 
dum  911144,  Naval  Underwater  Systems  Center,  June  1991. 

[10]  S.  S.  Wilks,  “Determination  of  sample  sizes  for  setting  tolerance  limits,”  Annals 
of  Mathematical  Statistics,  vol.  12,  pp.  91-96,  1941. 

[11]  H.  Robbins,  “On  distribution-free  tolerance  limits  in  random  sampling,”  Annals 
of  Mathematical  Statistics,  vol.  15,  pp.  214-216,  1944. 

[12]  H.  Scheffe  and  J.  W.  Tukey,  “Non-parametric  estimation  I.  Validation  of  order 
statistics,"  Annals  of  Mathematical  Statistics,  vol.  16,  pp.  187-192,  1945. 

[13]  A.  Wald,  “An  extension  of  Wilks’  method  for  setting  tolerance  limits,”  Annals 
of  Mathematical  Statistics,  vol.  14,  pp.  45-55,  1943. 


40 


-342- 


[14]  J.  W.  Tukey,  “Non-parametric  estimation  II.  Statistically  equivalent  blocks  and 
tolerance  regions  —  the  continuous  case,”  Annals  of  Mathematical  Statistics, 
vol.  18,  pp.  529-539,  1947. 

[15]  J.  W.  Tukey,  “Non-parametric  estimation  III.  Statistically  equivalent  blocks 
and  multivariate  tolerance  regions  —  the  discontinous  case,”  Annals  of  Math¬ 
ematical  Statistics,  vol.  19,  pp.  30-39,  1948. 

[16]  R.  B.  Murphy,  “Non-parametric  tolerance  limits,”  Annals  of  Mathematical 
Statistics ,  vol.  19,  pp.  581-589,  1948. 

[17]  B.  Juang,  “Maximum-likelihood  estimation  for  mixture  multivariate  stochastic 
observations  of  Markov  chains,”  AT&T  Technical  Journal,  vol.  64,  pp.  1235- 
1249,  1985. 

[18]  L.  R.  Liporace,  “Maximum  likelihood  estimation  for  multivariate  observations 
of  Markov  sources,”  IEEE  Transactions  on  Information  Theory,  vol.  IT-28, 
pp.  729-734,  September  1982. 

[19]  L.  R.  Rabiner,  S.  E.  Levinson,  and  M.  M.  Sondhi,  “An  introduction  to  the  appli¬ 
cation  of  the  theory  of  probabilistic  functions  of  a  Markov  process  in  automatic 
speech  recognition,”  The  Bell  System  Technical  Journal,  vol.  62,  pp.  1035-1074, 
1983. 


41 


-343- 


INTERCONNECTION 

WEIGHTS 


co 


Ul 

Q 

O 


O 

< 

IU 

x 

o 

IL 


IU 

o 

CL 
2 
o 

xo 

u.z 
o  ^ 

is 


8 


X  -I 


=|  u. 


Ill 
X 
_l 

< 

o 

x  b 

xz 

Z  o 

as 

QO 


<2 

§8 

ESi 

9  cc 

Z  u_ 

O  ii 

o 


Ul 


o 

UJ 

UJ 

X 

o 

111 

Q 


V  CM 

O  X 


E^coevi*- 

5.  x  x  x  cr 
iZ  HI  HI  HI  (1! 


42 


-344- 


Figure  1.  Four  Layer  Feed-Forward  Probabilistic  Neural  Network 


-50  0  50  100  150  200 


Figure  2  Optimum  Discriminant  Boundary  Curve,  With  Scatter  Plot  of  All  Sam¬ 
ples  Superimposed 


-345- 


Figure  3  Contours  of  Class  #1  Conditional  PDF  in  dB//max,  With  Scatter  Plot 
Superimposed 


44 


-346- 


Figure  4  Contours  of  Class  #2  Conditional  PDF  in  dB//max,  With  Scatter  Plot 
Superimposed 


45 


-347- 


200 


Figure  5  Contours  of  Class  #3  Conditional  PDF  in  dB//max,  With  Scatter  Plot 
Superimposed 


46 


-348- 


-349- 


Figure  7  Contours  of  Decision  Assurance  tf(A')  for  Example  #1  in  dB//max 


48 


-350- 


Figure  8  Contours  of  Decision  Risk  p(X )  for  Example  #2  in  dB//max 


49 


-351- 


Figure  9  Contours  of  Decision  Assurance  )  for  Example  #2  in  dB/ /max 


50 


-352- 


Class  Priors  For 
Entropy  Maximization 

R.  L.  Streit 


-355- 


Abstract 


Optimum  Bayesian  classification  requires  knowledge  of  class 
a  priori  probabilities.  Maximum  entropy  class  priors  are  proposed 
here  as  a  natural  choice  for  applications  in  which  the  class  a  priori 
probabilities  are  unavailable  and  cannot  be  estimated. 


-357- 


TM  No.  911144 


CLASS  PRIORS  FOR  ENTROPY  MAXIMIZATION 

INTRODUCTION 

The  correct  scalings  of  class  probability  density  functions 
(PDF’s)  for  optimum  classification  are  the  class  a  priori 
probabilities.  This  note  derives  "natural"  class  priors  that  are 
useful  in  applications  in  which  class  a  priori  probabilities  are 
unknown.  Their  utility  stems  from  the  natural  way  in  which  they  scale 
class  PDF’s  to  accomodate  relative  variations  in  PDF  support  and 
peakedness.  A  theoretical  Justification  for  these  priors  in  terms  of 
maximum  entropy  is  derived. 


PRESENTATION 

It  is  assumed  here  that  class  PDF’s  are  estimated  from  available 
class  training  samples.  The  differential  entropy  H(X)  of  a  random 
vector  variable  X  defined  on  Rn  with  continuous  PDF  p(x)  is  defined  by 

H(X)  «  -  p(x)  log  p(x)  dx.  (1) 

V 

The  Integral  in  (1)  is  the  expected  value  of  the  function  log  p(x).  If 
T  a  1  samples  of  X  are  given,  and  from  these  samples  an  estimate  p(x) 
for  p(x)  has  been  developed,  then  the  (posterior)  sample  entropy  is 
defined  here  by 


T 

H(X)  -  -  i  I  108  ptx^,  (2) 

k-1 


where  <Xj . x^.}  denotes  the  available  samples  of  X.  Note  that 


1 


-359- 


TM.  No.  911144 


A 

exp(-  H(X) ) 


A 

Thus,  exp(-  H(X) )  is  exactly  the  geonetrlc  mean  of  the  numbers  {pCx^)}. 

Suppose  that  there  are  m  a  1  classes  characterized  by  the  random 

vector  variables  X, .  X  defined  on  Rn  with  corresponding 

i  m 

continuous  PDF’s  p. (x),  ....  p  (x).  Clven  training  samples  of  these 
1  m 

random  variables,  estimates  p, (x),  ....  p  (x)  of  the  class  PDF’s  are 
developed  by  methods  (e.g. ,  Parzen  windows)  not  relevant  to  the  present 
discussion.  The  proposed  prior  for  each  class  is  proportional  to^  the 
reciprocal  of  the  geometric  mean  of  the  estimated  PDF.  Let  H(X^) 
denote  the  sample  entropy  (cf.  equation  (2))  of  X^  derived  from  the 
available  training  samples  for  Xj,  and  let  denote  the  proposed  prior 
for  class  i.  Then  is  given  by 

A 

exp(H(X. ) ) 

*1  '  TT - 71— ■  ,3) 

£  exp(H(Xk)) 
k=l 

The  denominator  in  equation  (3)  normalizes  so  that  1^*1. 

Intuitively,  the  priors  {a^}  scale  down  high  peaks  of  class  PDF’s  whose 
support  is  compact,  and  scale  up  the  broad  plateaus  of  class  PDF’s 
whose  support  is  widely  distributed. 


The  class  prior  probabilities  (3)  can  be  written  in  terms  of  class 
entropy  powers.  The  entropy  power  N(X)  of  a  general  random  variable  X 
is  defined  by 


N(X)  «  glj  exp(2H(X)).  (4) 

The  entropy  power  N(X)  is  the  average  noise  power  of  a  Gaussian  random 
variable  having  differential  entropy  H(X).  The  prior  a  is  clearly 

•  A 

proportional  to  the  square  root  of  the  sample  entropy  power  NCX^  of 

2 


-360- 


TM  No.  911144 


the  i-th  class. 

Theoretical  Justification  for  use  of  the  class  priors  (3)  proceeds 

as  follows.  Let  7  denote  a  memoryless  source  modeling  the  class 

sampling  process.  Ms  a  two  step  process.  The  first  step  selects  a 

random  variable,  or  class  symbol,  X.  from  the  list  <X,,  ....  X  >  with 

i  in 

probability  .  The  second  step  selects  a  sample  x  from  the  random 
variable  X^  selected  in  the  first  step.  The  likelihood  of  x  Is  p^(x). 
The  source  7  output  Is  the  pair  (X^.x).  An  important  distinction  is 
that  7  is  not  equivalent  to  a  source  7  whose  outputs  x  are  selected 
from  a  random  vector  variable  whose  PDF  is  the  mixture 

m 

p(x)  *  £  aj  Pj(x). 

1=1 

The  difference  between  the  sources  7  and  7  Is  that  output  from  y 
contains  class  label  information,  while  the  output  from  7  does  not. 
The  differential  entropy  of  7  is 

a 

H(y)  *  ~  I  J  n  “i  Pi(x)  lo8  l*!  Pj(X)J  dx-  (5) 

1=1  * 

Equation  (5)  Is  derived  by  noting  that  the  output  symbol  (X^.x)  occurs 
with  likelihood  a j  Pj(x),  so  that  the  differential  entropy  is  the 
appropriate  sum  and  Integral  of  this  likelihood  multiplied  by  its 
logarithm.  It  follows  easily  from  equation  (5)  that 

a 

H(y)  «  -  £  «t  [  1°C  «!  -  HtXjlj.  (6) 

1=1 

We  seek  a  stationary  point  for  H(y)  over  all  {a^}  satisfying  I  «j  ■  1. 
A  straightforward  application  of  Lagrange  multipliers  shows  that  the 
unique  stationary  point  Is 


3 


-361- 


TM.  No.  911144 


exp(H(X  ) ) 

*i  ■  -i - s —  <7) 

][  exp(H(Xk)) 

k«l 

This  stationary  point  Is  a  maximum  for  the  differential  entropy  H(/), 
as  can  be  shown  by  examining  the  second  order  derivatives.  The  priors 
(7)  are  therefore  maximum  entropy  priors  for  the  class  sampling  process 
f.  The  priors  (3)  are  sample  entropy  versions  of  the  maximum  entropy 
priors  (7).  The  maximum  entropy  H(y)  corresponding  to  the  priors  (7)  is 
the  negative  of  the  logarithm  of  the  denominator  of  equation  (7). 

Given  Independent  class  training  samples,  class  priors  can  be 
estimated  by  maximum  likelihood  (ML)  methods.  As  is  easily  shown,  ML 
class  a  priori  probability  estimates  are  proportional  to  the  relative 
abundance  of  the  Individual  class  training  samples.  ML  priors  are  thus 
independent  of  class  PDF  structure.  Unfortunately,  in  practice  the 
class  training  samples  are  very  often  screened  in  such  a  way  that  the 
relative  abundances  of  class  samples  contains  no  Information  about 
class  priors.  In  such  situations,  it  is  common  practice  to  choose 
diffuse  class  priors,  that  is,  all  class  a  priori  probabilities  are  set 
equal  to  1/m.  Diffuse  priors  are  thus  blind  to  class  training  samples 
and  class  PDF  structure.  The  maximum  entropy  priors  proposed  here  are 
a  more  natural  choice  than  diffuse  priors  for  Bayesian  classification 
applications  because  they  maximize  the  entropy  of  the  class  selection 
process  f.  Maximum  entropy  priors  defined  by  equation  (7)  depend  on 
class  PDF  structure  and  are  independent  of  relative  class  training 
sample  abundance. 


-362- 


MATHEMATICS 


Foreword 


Complex  function  approximation  by  linear  combinations  of  complex  valued  basis 
functions  is  well  understood,  both  mathematically  and  numerically,  if  least  squared  error  is 
used  to  measure  the  closeness  of  the  approximation.  However,  the  difficulty  of  the 
complex  approximation  problem  is  dramatically  increased  by  a  change  in  the  error  criterion. 
Papers  [25]  -  [28]  study  the  problem  using  the  norm  error  criterion.  (The  /«*,  norm  is 
known  by  several  names,  e.g.,  Chebyshev  norm  and  maximum  error.)  Only  two  methods 
are  currently  known  for  solving  general  approximation  problems  numerically.  One 
approach  is  via  iteratively  reweighted  least  squares  problems,  and  this  approach  is  studied 
theoretically  in  paper  [25]. 

The  other  approach  to  approximation  is  via  semi-infinite  programming  (SIP),  a 
specialized  variant  of  cutting  plane  methods  for  convex  optimization.  The  SIP  approach  is 
presented  in  papers  [26]  -  [28].  Application  of  this  work  to  linear  acoustic  array 
beamforming  problems  is  given  in  papers  [1]  -  [3].  The  provision  of  magnitude  constraints 
in  the  SIP  problem  formulation  also  enables  its  application  to  active  conformal  array 
beamforming  problems  in  which  acoustic  effects  between  the  elements  (projectors)  must  be 
limited  (cf.,  [paper  2,  page  3]).  An  application  of  SIP  methods  to  the  solution  of  systems 
of  indefinite  finite  difference  equations  by  polynomial  iteration  methods  is  presented  in 
paper  [29]. 

Papers  [30]  and  [31]  describe  concertina-like  variations  in  the  detailed  structure  of  a 
special  class  of  real  valued  functions  (Haar  systems)  generalizing  the  Chebyshev 
polynomials.  These  theoretical  results  support  observations  made  while  studying  the  linear 
array  problems  described  in  papers  [9]  and  [10].  The  remaining  papers  [32]  -  [34]  also 
document  results  encountered  in  the  course  of  other  investigations. 


Saddle  Points  And  Overdetermined 
Complex  Equations 


R.  L.  Streit 


-365- 


Saddle  Points  and  Overdetermined  Complex  Equations 

Roy  L.  Streit* 

Naval  Underwater  Systems  Center 
New  London  Laboratory 
New  London,  Connecticut  06320 


Submitted  by  Richard  A.  Brualdi 


ABSTRACT 

It  is  known  that  the  best  uniform  norm  solution  of  overdetermined  complex  valued 
systems  of  equations  satisfying  the  Haar  condition  for  matrices  is  also  a  best  weighted 
lp  norm  solution  for  each  p  >  1,  for  some  weight  vector  depending  on  p.  This  paper 
presents  an  alternative  proof  of  this  result  which  is  valid  for  arbitrary  matrices  A.  The 
proof  relies  on  the  fundamental  theorem  of  game  theory.  It  is  shown  that  a  saddle 
point  (z*.  A* )  of  a  certain  function  gives  a  uniform  norm  solution,  z*,  of  Az  -  b  and 
a  weight  vector  A*  of  the  equivalent  weighted  lp  norm  problem.  With  appropriate 
qualifications  concerning  the  weights,  it  follows  that  the  worst  (i.e.,  largest)  possible 
weighted  least  /p  norm  error  is  also  the  best  (i.e.,  least)  possible  Chebyshev  error.  For 
p  =  2,  it  is  shown  that  the  weight  vector  X*  solves  a  nonlinear  optimization  problem 
which  can  be  posed  without  reference  to  solution  vectors  of  Az  —  b.  In  other  words, 
the  problem  of  finding  the  best  uniform  norm  solution  of  Az  «  b,  when  stated  as  a 
convex  optimization  problem,  has  a  convex  dual  which  for  p  —  2  can  be  posed 
independently  of  the  primal  variables  z.  The  dual  variables  are  the  weights  A. 


I.  INTRODUCTION 

This  paper  is  an  investigation  of  solution  of  overdetermined  systems  of 
complex  linear  equations  using  the  uniform,  or  lx,  norm.  An  equivalent 
alternative  context  in  which  results  can  be  presented  and  interpreted  is 
complex  function  approximation  on  discrete  point  sets.  It  is  in  the  latter 
context  that  Motzkin  and  Walsh  [1,  2]  prove  that  the  best  uniform  norm 
solution  of  an  overdetermined  real  system  Ax  =  b  is  equivalent  to  a  weighted 
least  pth  power  solution  for  each  p  >  0,  assuming  that  the  matrix  A  satisfies 


•This  work  was  supported  by  the  Office  of  Naval  Research  Project  RR01407-01  and  by  the 
Independent  Research  Program  of  the  Naval  Underwater  Systems  Center. 

LINEAR  ALGEBRA  AND  ITS  APPLICATIONS  64:57-76  (1985)  57 


-367- 


58 


ROY  L  STREIT 


the  Haar  condition  for  matrices.  Their  results  are  extended  to  complex 
systems  by  Lawson  [3],  who  also  gives  an  algorithm  for  constructing  the 
weights  of  equivalent  least  pth  power  problems  for  p  >  1.  A  nice  summary  of 
these  results  is  given  in  [4,  5]. 

An  alternative  proof  of  the  Motzlan-Walsh  result  for  p  >  1  is  given  in  this 
paper.  The  proof  does  not  assume  the  Haar  condition  and,  in  fact,  is  valid  for 
arbitrary  complex  matrices  A.  With  appropriate  qualifications  concerning  the 
weights,  it  is  also  proved  that  the  worst  (i.e.,  largest)  possible  weighted  least 
pth  power  error  is  also  the  best  (i.e.,  least)  possible  uniform  error.  This  result 
seems  to  be  new,  although  it  is  implicit  in  the  proof  of  the  convergence  of 
Lawson’s  algorithm.  Lawson,  however,  does  not  state  it  explicitly. 

Theorem  1  states  that  the  function  <j>p,  defined  by  (1),  has  a  saddle  point 
in  a  certain  domain.  All  other  results  follow  essentially  as  corollaries.  This 
approach  is  very  different  from  that  of  Motzkin  and  Walsh.  The  proof  of 
Theorem  1  relies  oi  a  connection  between  the  solutions  of  overdetermined 
systems  of  equations  (or,  equivalently,  approximation  on  discrete  point  sets) 
and  the  fundamental  theorem  of  game  theory.  This  relationship  does  not  seem 
to  be  mentioned  elsewhere. 

The  special  case  p  =  2  is  particularly  interesting.  In  Theorem  3  below,  it 
is  shown  that  the  weights  for  the  equivalent  least  squares  problem  solve  a 
nonlinear  mathematical  programming  problem.  So  far  as  is  known  to  the 
author,  these  weights  have  not  been  previously  characterized  in  quite  this 
manner.  A  special  subcase  is  a  problem  posed  by  J.  J.  Sylvester  in  1857.  It  is 
discussed  in  Section  IV.  Theorem  3  can  also  be  used  to  prove  a  result  due  to 
de  la  Vallee  Poussin  [6]  for  real  systems  and  extended  to  complex  systems  by 
Rivlin  and  Shapiro  [7],  It  is  discussed  in  Section  V. 

Motzkin  and  Walsh  prove  their  function  approximation  results  on  the 
interval  [0, 1]  as  well  as  on  discrete  point  sets.  It  is  therefore  likely  that  greater 
generality  is  possible  in  our  Theorem  1  which  permits  its  application  to 
systems  having  an  infinite  number  of  equations  in  a  finite  number  of 
unknowns.  For  the  purposes  of  this  paper,  however.  Theorem  1  in  its  present 
form  is  satisfactory. 

The  Motzldn-Walsh  results  do  not  give  insight  into  how  the  correct 
weights  for  the  equivalent  least  pth  power  problem  might  be  constructed. 
Lawson’s  original  algorithm  is  apparently  the  only  one  currently  available, 
and  its  convergence  proof  assumes  the  Haar  condition  for  A.  The  algorithm 
requires  the  solution  of  a  sequence  of  weighted  least  pth  power  problems, 
updating  the  weights  at  each  step  of  the  sequence.  The  correct  weights  are 
obtained  in  the  limit.  The  special  case  p  =  2  is  of  the  greatest  computational 
interest,  since  least  square  problems  are  easily  solved.  The  major  drawback  to 
Lawson’s  algorithm  is  that  convergence  can  be,  and  often  is,  very  slow  in 
practice  [8]. 


-368- 


OVERDETERMINED  COMPLEX  EQUATIONS 


59 


Alternatives  to  Lawson’s  algorithm  can  be  based  on  Theorems  1  and  2.  In 
other  words,  general  algorithms  for  computing  a  saddle  point  of  a  given 
continuous  function  can  be  applied  to  the  problem  at  hand,  i.e.,  to  <j>p  below. 
In  particular,  such  algorithms  would  be  applicable  to  the  case  p  =  1  which 
Lawson’s  algorithm  does  not  treat.  Conversely,  since  the  Lawson  algorithm 
can  now  be  interpreted  as  a  procedure  for  computing  a  saddle  point  of  tf>p  for 
p  >  I,  it  would  be  interesting  to  know  whether  or  not  Lawson's  algorithm 
constitutes  a  special  case  of  an  existing  algorithm  for  computing  saddle  points. 
If  not,  perhaps  it  can  be  extended  to  construct  saddle  points  of  more  general 
functions. 


II.  THE  THEOREMS 


Let  the  complex  matrix  A  =  [a,y]  £  CmXn  and  the  vector  b  =  (b,)e  C"' 
be  given.  Let  1  ^  p  <  oo.  Define  <pp :  Cn  X  Rm  -*  R  by 


/  m 

4>pU.M=(  Ex 


*  /  ) 


1/P 


(i) 


where  z  =  (z,)eC"  and  X=(X,)6Rm.  Define  A={XeRm:X>0  and 
X  j  +  •  •  •  +  Xm  =  1}.  Note  that  A  is  convex.  A  point  (z*,  X*)  is  defined  to  be 
a  saddle  point  of  <|>p  on  C"  X  A  if  (z*,  X*)e  C"  X  A  and 

*p(z*,A)«<f.p(z*,X*)S£<f>p(z,X*). 


The  central  result  is  the  following  theorem.  Its  proof  relies  on  the  fundamen¬ 
tal  theorem  of  game  theory  and  is  postponed  to  Section  III. 


Theorem  1.  For  every  matrix  A  e  cmxn  and  vector  b  e  Cm,  the  func¬ 
tion  <j>p  has  a  saddle  point  on  the  set  C"  X  A. 

No  assertion  of  uniqueness  of  the  saddle  point  is  made  by  Theorem  I. 
Sufficient  conditions  for  uniqueness  are  not  pursued  in  this  paper. 

It  should  be  noted  that  Theorem  1  is  valid  for  all  n  >  1  and  m  >  1. 
Define  the  uniform  norm  Ij  jl^  of  any  vector  in  Cm  to  be  the  maximum 
modulus  of  its  components.  A  Chebyshev  solution  of  the  system  of  equations 
Az  =  b  is  any  vector  z  *  for  which 

||i- Az*!^  =  min  ||h- AzH^..  (2) 

:eCn 


-369- 


60 


ROY  L  STREIT 


A  weighted  least  pth  power  solution  of  Az  =  b  is  any  vector  z  *  for  which 

<t>  (z*,A)  =  min  <t> p(z,A),  (3) 

zee" 

where  A  is  the  given  set  of  nonnegative  weights.  The  next  theorem  connects 
Chebyshev  solutions  and  weighted  least  pth  power  solutions. 

Theorem  2.  Let  (z*,  A*)e  C"  X  A  be  a  saddle  point  for  <j>p,  1  <  p  <  oo. 
Then 

(i)  z*  is  a  Chebyshev  solution  of  Az  =  b  - 

(ii)  z*  is  a  weighted  least  p  th  power  solution  of  Az  =  b  for  the  weight 
vector  A*. 

Furthermore,  the  saddle  value  <J>p(z*,A*)  is  the  error  of  the  Chebyshev 
solution,  i.e.,  <f>p(z*.  A*)  =  ||b  -  Az*||x. 

Proof.  By  a  well-known  result  [14,  Theorem  3.15],  <#>p  has  a  saddle  point 
(z*,  A*)  on  C"  X  A  if  and  only  if 

max  min  <f>Jz,  A )  =  <t>Jz*,  A*)  =  min  max  </>_{ z ,  A ) .  (4) 

X  G  A  :  6  C"  ze  C"  X  e  A 

Let  the  largest  of  the  m  quantities  \b,  -  Lfl.jZj]  occur  for',  say,  i  =  h  (depend¬ 
ing  of  course  on  z ).  Then 

n 

max 4>p( z , A )  =  bk~  £  akjzj 

XeA  /-l 

as  can  be  seen  by  taking  A*  =  1  and  A,  =  0  for  i  #  k.  Equivalently, 
jnax<J>p(z,  A)  =  ||fo  -  Az||x. 

Consequently, 

4>  (z*,A*)=  min  max<^  (z,  A)=  min  ||fe  -  Az||x, 
s  g  C"  A  G  A  sGC" 

and  z*  is  a  Chebyshev  solution  of  Az  =  b.  The  saddle  value  <f>p(z*,  A*)  is  the 
Chebyshev  error.  Next  note  that 

<t> (z*,  A*)=  max  min<f>(z,A)=  min£(z,A*),  (5) 

A  g  A  z  eCn  y  z  eC"  y 


-370- 


OVERDETERMINED  COMPLEX  EQUATIONS 


61 


and  so  z*  is  a  weighted  least  pth  power  solution  of  Az  =  b  for  the  weight 
vector  A*.  This  completes  the  proof.  ■ 

Corollary  I.  Define,  for  all  A  e  A, 

»//  (A)=  min  <*>  (z,X).  (6) 

P  :er 

Then,  under  the  conditions  of  Theorem  2, 

+P(A)<+,,(X*)«||I>-A**||a#. 


Proof.  Immediate  from  (5).  ■ 

Another  way  to  say  this  is  as  follows.  The  error  of  a  weighted  least  pth 
power  solution  of  the  system  Az  =  h  is  ^p(A),  and  this  error  is  maximized 
over  allowed  weights  X  e  A  for  X*.  Furthermore,  the  maximum  of  such  an 
error  equals  the  minimum  of  the  Chebyshev  error  of  Az  =  b. 

Corollary  2.  The  vector  bis  a  linear  combination  of  the  columns  of  the 
matrix  A  if  and  only  if  the  saddle  value  of  <f>p  is  zero. 

Proof.  Let  (z*.  A*)  be  a  saddle  point  of  <f>p.  If  <J>p(z*,  A*)  =*  \\b  -  Az*!^ 
*=  0,  then  b  -  Az*.  Conversely,  if  b  =  Az  for  some  z  e  C",  then  i/-p(X)  =  0 
for  all  A,  so  that  max  i//p(A)  =  0  =  <J>p(z*,  A*).  ■ 

The  vectors  z*  and  X*  are  defined  jointly  via  the  saddle  point  property  of 
<#>p.  Theorem  2  shows  that  z*  also  solves  an  optimization  problem  that  does 
not  require  knowledge  of  X*;  that  is,  z  *  is  a  Chebyshev  solution  of  Az  —  b. 
In  many  cases  this  property  alone  will  uniquely  determine  z*.  Theorem  3 
below  will  show  that  an  analogous  situation  exists  with  regard  to  X*  for  the 
special  case  p  *  2.  The  distinction  of  the  case  p  =  2  is  a  consequence  of  the 
fact  that  <j/2(X),  as  defined  by  (6),  can  be  expressed  explicitly  in  terms  of  X 
alone.  For  other  values  of  p,  use  of  an  implicit  function  theorem  seems  to  be 
necessary. 

Define  the  complex  matrix  L(A)=  [L,^(X)]  e  CnXn  by 

L(A)  =  AHdiag(A)A,  (7) 

where  AH  is  the  conjugate  transpose  of  A,  and  diag(X)  is  the  mXm  diagonal 


-371- 


62 


ROY  L  STREIT 


matrix  whose  main  diagonal  consists  of  the  components  of  X.  Thus, 


i-  1 

The  complex  matrix  M(X)e  c<n  +  1'x(n  +  1'  is  defined  by  bordering  L(X): 


M(X)  = 


L(X)  !  fi(\) 

fi (a}T«(x  j 


where  0(\)  =  (j8y(X))  6  C"  is  given  by 


0y(X)=  £  X.fc.a^,  ;  = 


and  a(X ) €  R  is  given  by 


«(X)  =  IAM- 


Note  that  both  L(X)  and  M(X)  are  Hermitian  matrices. 

The  matrix  AeC"x"  with  m  St  n  is  said  to  satisfy  the  Haar  condition  for 
matrices  if  and  only  if  every  collection  of  n  rows  of  A  has  rank  n. 

Theorem  3.  Let  m>  n,  and  let  A  satisfy  the  Haar  condition  for 
matrices. Let  (z*,  X*)  be  a  saddle  point  of  <p2(z,  X)  on  C"  X  A  with  saddle 
value  X*)>  0.  Then 


min  ||h-  Az||00  =  ||fc-  Az*||0 
:eC" 

=  <Mz*>  x*) 


det  Af(X«)l1/2 
det  L(  X* ) 


det  Af  ( X  )  1 1/2 
detL(X) 


where  the  maximum  (15)  is  taken  over  Xe  A  with  det  L(X )  =*  0. 


-372- 


OVERDETERMINED  COMPLEX  EQUATIONS 


63 


Proof.  Both  (12)  and  (13)  follow  from  Theorems  1  and  2,  so  it  is 
necessary  to  prove  only  (14)  and  (15).  The  definition  (6)  for  p  =  2  is 


\p2(\)=  min 

;6Cn 


Z.K 


b- 


i-l 


(16) 


By  Corollary  1, 


<M  =  *-A*)  =  max^A)- 


(17) 


Since  the  saddle  value  is  positive,  we  restrict  attention  throughout  the 
remainder  of  this  proof  to  those  vectors  A  for  which  ^2(  A )  >  0.  (By  Corollary 
2,  the  vector  b  is  linearly  independent  of  the  columns  of  A.)  Since  A  satisfies 
the  Haar  condition  for  matrices,  it  follows  that  A  has  n  + 1  or  more  positive 
components,  for  otherwise  it  is  easy  to  see  from  (16)  that  ^2(A)=  0-  Conse¬ 
quently,  the  Hermitian  form  of  L( A), 


zHL(\)z  =  z"AHdxng(\)Az =  £  A, 


a>izi 


must  be  positive  for  nonzero  vectors  z.  Hence,  det  L(A)  *  0,  and  the  normal 
equations 

L(A)z=[AHdiag(A)A]fe  =  0(A)  (18) 

are  nonsingular.  It  is  convenient  within  the  confines  of  this  proof  to  define  the 
auxiliary  symbols 

zn  +  r=-l.  a,.n  +  i  =  ^»  «  =  1 . «•  (19) 

Rewriting  the  normal  equations 

E  E  A.o^z^  EA,.^,,  v  =  1,..., n,  (20) 

j-1 i-l  i-1 


and  using  the  symbols  (19)  gives 


n  +  1  m 

E  E  A.a^z^O, 

j-i i-i 


v  =  1,...,  n. 


(21) 


-373- 


64 


ROY  L  STREIT 


Now,  from  (16),  for  any  z  satisfying  the  normal  equations. 


* i(M=  EM,-  L  aijzj 

i-i  >-i 

2 

m  n  +  1 

=  I 

i-1  /-I 

m  n+1 n+1 

=  £  XI  £ 

j- i j - i ►- i 

Reversing  the  order  of  the  triple  sum  gives 

n+1  n+1  m 

V4(M  =  £  S.  £  £  M, ,«*/*/ 

»-  1  j- l  i  -  1 

=  -  I  I  .■*  !«,/=,.  (22) 

i-i  *-i 

where,  in  the  last  equation,  (21)  was  used  to  set  to  zero  the  double  sum  for  the 
cases  v  =  1,. . . ,  n,  and  for  v  =  n  + 1  the  value  sn+ ,  =  —  1  was  substituted. 
Rewriting  (22)  without  the  symbols  (19)  gives 


V'i(^)+  22  £ (23) 

;-l i-1  i-1 

Thus  (20)  and  (23)  constitute  n  +  1  equations  in  the  n  + 1  unknowns  and 
i/4(A).  This  system  can  be  written 

L(A)j  °1[  2  1  = 

The  coefficient  matrix  clearly  has  rank  n  +  1  and  so  is  nonsingular.  Solving  for 
^£(A)  using  Cramer’s  rule  gives 


*!(*)■ 


det  Af  ( A ) 
det  L(  A) 


Substituting  (24)  for  \p2  in  (17)  concludes  the  proof. 


-374- 


OVERDETERMINED  COMPLEX  EQUATIONS 


65 


If  the  Haar  condition  hypothesis  on  A  in  Theorem  3  is  replaced  by 
rank  A  =  n,  then  the  result  (15)  does  not  hold  in  general,  because  det  L(X*) 
can  vanish  at  a  saddle  point  (z*,  X*)  for  <f>2.  Consider  the  following  example. 
Let  m  =  n  + 1.  Let  the  first  n  rows  of  A  be  the  identity  matrix,  and  let  the 
mth  row  of  A  be  identically  0.  Let  b,  =  •  •  •  =  b„  =  0  and  bm  =  y  >  0.  One 
saddle  point  (z*,  X*)  of  $2  is  z*  =  •  •  •  =  z*  =  0  and  X^  =  •  •  •  =  X*  *  0, 
X* +,  *  1.  The  saddle  value  <f>2(z*,  X*)=  y  >  0,  but  the  n  X  n  matrix  L(X*) 
contains  only  zero  entries  and  det  L(X*)  =  0.  Note  that  X*  is  unique  in  this 
example. 

The  determinants  in  (24)  are  actually  Cram  determinants  for  the  indefi¬ 
nite  inner  product  on  Cm  defined  by 

m 

(ti,c)=  52  XjUjC,  for  X  >  0. 

i  —  1 


For  a  definition  of  Cram  determinants,  their  properties,  and  a  nearly  equiva¬ 
lent  derivation  of  (24),  see  [9,  pp.  176-187], 


III.  PROOF  OF  THEOREM  1 


The  proof  applies  the  fundamental  theorem  of  game  theory  to  the 
function  <f>p.  The  variant  of  the  fundamental  theorem  that  is  utilized  is  stated 
for  functions  defined  on  Cartesian  products  of  convex  subsets  of  real  Euclidean 
spaces.  Although  <Pp  is  defined  on  C"  X  A,  by  an  obvious  device  it  can  be 
thought  of  as  being  defined  on  R2"  X  A  instead,  without  suffering  any  loss  of 
generality  in  what  follows.  The  complex  notation  is  retained  for  ease  of 
exposition. 

The  function  4>p  is  continuous  in  both  variables  and,  as  the  next  lemma 
shows,  it  is  a  convex  function  in  its  first  variable  and  a  concave  function  in  its 
second  variable. 

Lemma  1.  For  1  <  p  <  oo,  the  function  <j>p  is  convex-concave  on  C"  X  A; 
that  is,  for  a  +  P  ■  1,  a  >  0,  /)  >  0, 

*»p(az  +  fiw,  X)  <  a<t>p(z, \)+0<pp(u>,  X), 

4>p(z,aX  +  0y )  >  a4>p(z,  X)  +  0<frp(z,  y), 

where  z  and  w  are  elements  of  C",  and  X  and  y  are  elements  of  A. 


-375- 


66 


ROY  L  STREIT 


Proof.  Let  tt  =  fa,  -  La^Zj  and  s,  =  fa,  -  La,jWj.  Then 

n 

*>t-  E  aiy(az/  +  ^u>>)  =  at,  +  /3sj. 


From  the  definition  of  <t>p. 


<t>p(az  +  /3w,\)  =  (  £  K/p(a<i  +  /fci)f 
V  •  —  1 


i/p 


=  a<f>p(z.X)  +  0<J>p(u>,X), 


i/p 


which  proves  that  <Jtp  is  a  convex  function  in  its  first  argument  for  each  A.  To 
prove  that  4>p  is  a  concave  function  in  its  second  argument,  fix  z  and  let 
Qx  =  <f>p(z,  \)  and  PT  =  <>p( V )•  The  case  p  =  1  is  obvious,  so  assume 
1  <  p  <  oo  and  let  q  satisfy  1/p  +  \/q  =  1.  Then 

<*<t>p(z,*)  +  /3<i>p(z,y)  =  a()x  +  fiQr 

-(«,/pQx)(«,/‘’)+(0,/p<?r)(0,/'') 

*{(*1/pQx)P  +  (Pl/pQy)P}'/P 

X  {(a1/q)q  +  (fil/q)q}1/q 

~{°QZ  +  PQZ}l/P 

/  m  m  \ 

V  i-l  i-J  / 

=  <t>p(z,a\  +  0y). 

This  completes  the  proof  of  the  lemma.  ■ 

The  following  theorem  is  due  to  H-  Kneser  [10],  See  also  [11,  pp.  8-13]. 


OVERDETERMINED  COMPLEX  EQUATIONS 


67 


Theorem  4.  Let  E  c  Rs  and  F  c  RM  be  convex  sets.  Let  the  j unction 
f  :  E  x  F  -*  R  be  convex  on  E  for  each  fixed  yeF  and  concave  on  F  for  each 
fixed  x  G  E.  If  one  of  the  sets  E  and  F  is  compact,  and  if  the  function  f  is 
continuous  in  the  corresponding  variable,  then 

sup  inf  f{x,y)=  inf  sup/(x,{/). 

V  e  /•'  *  e  £  iel»€f 

Applying  Kneser's  theorem  to  <t>p  gives 

sup  inf  4>p(z,\)=  inf  sup  <f>p(z.  A). 

A  e  A  r  e  C"  zeC"  A  e  A 

Now,  by  a  standard  argument,  for  each  A  the  infimum 

inf  <t>  (z.  A) 

zee" 

is  attained  for  some  z.  Furthermore,  the  resulting  function  of  A  is  continuous 
on  the  compact  set  A  and  so  attains  its  supremum.  Thus,  the  supinf  can  be 
replaced  by  max  min.  Similarly,  for  each  z,  the  supremum 

sup  <f>p(z,  A) 

A  e  A 

is  attained  for  some  A,  since  <f>p  is  continuous  on  the  compact  set  A.  By  a 
standard  argument,  the  resulting  function  of  z  attains  its  infimum.  Hence, 

max  min  *,( z ,  A )  =  min  max  *,( 2 ,  A ) .  (25) 

A  e  A  ;  e  C"  :€C*AsA  P 

The  existence  of  a  saddle  point  follows  immediately.  Choose  A*  such  that 

min  $_(z.  A*)  =  max  min  4>_(z,  A). 
ieC"  AeAjGC" 

Choose  z*  such  that 

max<f>(z*,  A)  =  min  max  <f>_(z.  A). 

AeA  :eC"AeA 

Then 

«p(z,A»)>«p(z*,A*)  for  all  z  e  C" 

and 

<f>p(z*,A)<<frp(z*,A*)  for  all  AeA. 


-377- 


68 


ROY  L.  STREIT 


The  last  two  inequalities,  by  definition,  show  that  ( z  *,  A* )  is  a  saddle  point  of 
<t>p.  This  completes  the  proof  of  Theorem  1. 


IV.  SYLVESTER’S  PROBLEM 

It  is  worthwhile  observing  the  form  (15)  takes  in  the  special  case  of 
finding  the  best  complex  constant  to  fit  given  complex  data.  Specifically,  find 
z  *  €  C  such  that 


max  |  b,  -  2  *  |  =  min  max  |  —  2 1 .  (26) 

1  <  i  <  m  ;eCUi<m 

This  problem  is  equivalent  to  a  problem  posed  by  J.  J.  Sylvester  in  1857,  i.e., 
given  m  points  in  the  plane,  find  the  smallest  circle  containing  them  all.  The 
center  of  the  smallest  circle  is  the  constant  2*,  and  the  radius  is  the  min  max 
in  (26).  In  this  case,  the  matrix  L(A)=[A,+  •••  +  Am]  =  [l]  is  the  lxl 
identity  matrix,  and  M(A)  is  the  2x2  matrix 


Af(  A)  = 


1 

m 

LK'b, 

i-l 


£M>, 


.-1 


m 

I  m 


Hence,  from  (15),  we  need  to  compute 


max 

A  e  A 


m 


,*t 1/2 


£*A 


The  problem  (27)  is  equivalent  to  the  quadratic  program: 


(27) 


QP.  min  ATGA  -  crA 

Ag  Rm 

subject  to  A  >  0  and  Ai  +  •  •  •  +  Am  =  1, 

where  the  real  matrix  G  e  RmXm  is  given  by  G  =  Re(bbH)  =  (RebyReb)7  + 
(Im  b  Xlm  b  )r,  and  the  components  of  the  real  vector  c  =  (c,)eflm  are  given 
by  c ,  =  jbf|z.  It  is  easy  to  see  that  C  is  a  positive  semidefinite  matrix  of  rank  at 


-378- 


OVERDETERMINED  COMPLEX  EQUATIONS 


69 


most  2.  The  objective  function  of  QP  is  thus  convex,  and  any  locally  optimal 
solution  X*  of  QP  is  a  global  solution.  It  then  follows  from  the  normal 
equations  (18)  that  the  best  constant  is  given  by 

~*=  E  KK  (28) 

i  -  1 

The  maximum  in  (27)  is  positive  when  m  >  1  and  at  least  two  of  the  data 
points  b j  are  distinct.  To  see  that  (27)  is  nonnegative,  simply  note  that 

2 

m  m  _ 

*  E  MV  I  |xy2| 

i  -  1  i  -  1 

-  imf- 

i  - 1 

The  second  inequality  is  strict  under  the  conditions  cited,  so  the  maximum  is 
positive. 

Discussion  of  the  history  of  Sylvester’s  problem,  together  with  an  efficient 
computational  algorithm  for  its  solution,  is  given  in  [12].  The  algorithm  given 
there  solves  the  natural  extension  of  Sylvester’s  problem  to  data  points  given 
in  higher  dimensional  Euclidean  spaces. 

It  is  curious  that  a  simple  alteration  of  the  problem  substantially  alters  the 
difficulty  of  its  solution.  Instead  of  (26),  consider 

max  |h,  -  a,z*  I  =  min  max  \bi-aiz\  (29) 

intern  :eCUi<m 

for  a  *» (a,)e  Cm  given.  The  matrix  L(A)  is 

L(X)-[x,|fll|2+..-+Xm|ara|2]eClxl, 
and  the  matrix  M(X)  is 

EAJaJ2  £X  ,5,6, 

M(X)=  ‘-1  €C2^. 

E  K*3>i  E^IM 

i  - 1  i  - 1 


-379- 


70 


ROY  L.  STREIT 


Clearly,  computing  the  maximum  of  det  M(  A )/det  L( A )  is  not  simply  equiva¬ 
lent  to  a  quadratic  program  in  this  case.  (Computationally,  however,  it  may 
be  solvable  by  parametric  quadratic  programming  methods.)  Given  a  solution 
vector  A*,  then 


I 


■  - 1 


L  ^a.I2 

i  -  1 


(30) 


is  a  solution  of  (29),  as  can  be  seen  from  the  normal  equations  (18). 

When  all  the  given  data  points  bi  are  real,  the  problem  (26)  has  a  trivial 
solution.  Let  r  and  s  be  indices  for  which  min  bi  and  max  b,  occur, 
respectively.  The  best  constant  z *  in  (26)  is  real  and  z*  =  (b,  +  hs)/ 2. 
Considering  (28),  it  is  evident  that  a  solution  of  (27)  is  A*  =  A*  =  s  with  all 
other  A*  =  0.  The  maximum  in  (27)  is  thus  equal  to  | b,  —  fcJ/2,  a  fact  not 
immediately  apparent  from  (27)  itself. 


V.  NEW  PROOF  OF  DE  LA  VALLEE  POUSSIN’S  THEOREM 

Let  A  e  C(n*  1)X",  and  let  A,  denote  the  n  X  n  matrix  obtained  from  A 

by  deleting  the  ith  row,  i  =  1 . n  +  1.  De  la  Vallee  Poussin  [6]  proves  the 

following  result  for  real  systems.  Rivlin  and  Shapiro  (7,  pp.  692-694]  show 
that  it  holds  for  complex  systems  also. 

Theorem  5.  Let  A  e  C,n* l,xn  satisfy  the  Hoar  condition  for  matrices, 
and  let  b  e  C"  +  *.  Then 


min  ||fe-  Az\\k 

zeC" 


£  (  -l)'fe,detAt 


n  +  1 

L  I  det  A ,  | 

i  -  1 


(31) 


We  use  Theorem  3  above  to  derive  (31).  The  procedure  is  to  solve  the 
maximum  problem  (15)  explicitly  for  A*  and,  from  A*,  deduce  (31).  As  a  side 
benefit,  once  A*  is  known,  the  Chebyshev  solution  vector  z*  can  be 
constructed  numerically  by  solving  the  A*  weighted  least  squares  problem.  In 
principle,  this  is  equivalent  to  solving  the  normal  equations  (18)  with  A  =  A*. 


OVERDETERMINED  COMPLEX  EQUATIONS 


71 


In  contrast  to  the  proof  based  on  Theorem  3,  the  original  proof  of  (31) 
proceeds  by  minimizing  the  number  Q  =  \\b-  AzH^.  directly.  It  turns  out 
that  this  can  be  done  relatively  easily  and  in  such  a  way  that  the  Chebyshev 
solution  z  *  can  be  constructed.  Thus,  the  original  proof  and  the  proof  based 
on  Theorem  3  solve  the  “primal”  and  the  “dual”  problems,  respectively. 

We  first  establish  a  general  algebraic  identity  concerning  determinants. 
Special  cases  of  this  identity  will  be  used  in  the  solution  of  (15). 

Lemma  2.  For  i  =  1,...,  m,  let  X,  e  C,  x,  e  C",  and  y,  e  C".  Then,  for 
m  >  n  >  1, 

det|  £  Xjxj!/1rJ 

=  L(  — (32) 

where  [xj(, xj(tl)]  and  (j/(), «/j(n)]  denote  the  n  X  n  matrices  whose 

t-th  columns  are  xiu)  and  yl(n,  respectively,  t  =  1 . n,  and  the  sum  is  over 

all  indices  i(l),...,i(n)  such  that  1  <  i(l)<  i(2)<  •  ••  <  i(n)<  m.  For  n  > 
m  5*  1,  the  determinant  of  the  left  hand  side  of  (32)  is  identically  zero. 

Proof.  Let  X6Cnxm  and  Y  eCnXm  denote  the  matrices  whose  fth 
columns  are  A,x,  and  y,,  respectively,  t  =  1, ...,m.  Then 

det|  £  X.x.y,7'  j  =  det(XYT). 

The  Binet-Cauchy  formula  [13,  pp.  8-10]  for  the  determinant  of  the  product 
of  two  rectangular  matrices  has  two  cases.  For  n  >  m  >  1,  it  states  that 
det(XYr)=  0.  For  m  ^  n  >  1,  in  the  present  notation,  it  gives  det(XYr)  = 
Ldet[Xj(l)xj(1),...,  Xj(n)xj(B)]det[y1^1),...,y^n)],  with  the  sum  ranging  over 
all  indices  with  1  <  i(l)<  i(2)<  •••  <  i(n)<  m.  The  identity  (32)  follows 
immediately  by  factoring  Xj(()  out  of  column  t  in  the  first  determinant,  and 
noting  that  the  determinant  of  a  matrix  and  its  transpose  are  equal.  This 
concludes  the  proof.  ■ 

If  yk  =  Xj  for  all  i,  then 

det|  £  XjXjX,” |  =  £ |  n^Xj(()||det[xj(1) . xj(n)]|  .  (33) 


-381- 


72 


ROY  L  STREIT 


It  is  an  important  fact,  implicitly  used  in  the  next  lemma,  that  the  coefficient 
of  each  product  Aj(1)...  AMn)  in  (33)  is  nonnegative.  Note  also  that  (33)  is 
nonnegative  when  all  A ,  >  0. 


Lemma  3.  Let  n  >  1.  Fori  =  1,...,  n  + 1,  let  e  R,  xi  e  C",  y,  eC"*1. 
If  every  subset  of  n  of  the  vectors  x,,...,  xn+1  is  linearly  independent ,  then 
the  ratio 


det 

n  +  1 

E 

*-i 

det 

\ 

E 

i.-i  / 

(34) 


attains  its  maximum  over  all  A  £  A  for  which  the  denominator  is  nonzero.  A 
maximizing  vector  is 


IdetS.I 

*  >  n  -  1 

E  I  det  S,  | 


i  =  1 , . . . ,  n  +  1 , 


(35) 


where  S,  denotes  the  nXn  matrix  (Xj,...,  x,_ ,,  xj  +  1,.. .,  xn  + ,],  i  =  1 . n 

+  I.  The  maximum  value  of  (34)  is 


ldet[yl,...,y„  +  i]|  \ 

| det St |  +  •••  +|detSn+I|  / 


(36) 


The  vector  A*  is  unique  if  and  only  if  det[yit  •  •  •  ,«/n  +  J  *  0. 


Proof.  From  (33),  since  each  x,  e  C"  and  m  =  n  4- 1,  then 


det|E1Ajxix«)  =  "fin  Vlldet  S,r, 

i  -  1 


(37) 


I-l  \  r-  1 

T+t 


where  S ,  is  as  in  the  lemma  statement.  Because  the  vectors  { x,,  t  =  I,..., n, 
i  *  t }  are  linearly  independent,  det  S,  =*  0.  Consequently,  the  vector  A*  given 
by  (35)  is  a  well-defined  element  of  A,  and  the  determinant  (37)  does  not 


-382- 


OVERDETERMINED  COMPLEX  EQUATIONS  73 

vanish  for  X  =  X*.  From  (33),  since  each  y,  eC"’1  and  m  =  n  +  1,  then 

(n  +  1 

£  x 

If  detli/j,...,  tfn  +  i]  vanishes  then  X*  may  as  well  be  selected  as  the  maximiz¬ 
ing  vector.  Suppose  now  that  det[ y y„  +  J  *  0.  The  maximum  ratio  of  the 
determinants  in  (34)  is  therefore  positive.  Hence,  from  (38),  we  may  restrict 
attention  to  vectors  X  >  0.  The  ratio  (34)  can  now  be  written  as 
|det[{/1,...,!/n+1]|2//(H  w^re 


/(M=  "l 


i  - 1 


|detS,|2 


Maximizing  the  ratio  is  equivalent  to  minimizing  /(X)  subject  to  X  e  A.  A 
minimizing  vector  is  necessarily  positive  in  this  case,  so  we  form  the 
Lagrangian 

^(X,«)  =  /(X)+a(X1+  +  X„. ,  —  1). 


Stationary  points  of  ¥  satisfy 

2 

d  ,  |detS. | 

sx;*(Ka> - if-*""  . “  ' 

=  ■■■  +  An.,  -  1  =  0. 

These  equations  imply  that  a  >  0,  that 


*1  = 


|  det  S,  | 

~7T~ 


t  =  1,...,  n  +  I, 


that  Jtx  —  |  det  S  j  |  +  •••  +|detSn  +  1|,  and  that  X*  is  the  only  stationary 
point.  It  is  obvious  from  the  definition  of  /(X)  that  this  stationary  point  is  a 
minimizing  point  for  /.  The  minimum  value  of  /  is 


n  +  1 

/(**)=  I 


i- 1 


IdetS,!2  "  +  1 
K  ,-i 


|  det  S,  |2 
|  det  S,  |  /  Ja 


-383- 


74 


ROY  L.  STREIT 


so  the  maximum  value  of  the  ratio  of  determinants  is  as  claimed.  This 
completes  the  proof.  ■ 

The  proof  of  de  la  Vallee  Poussin’s  result  (31)  is  now  easy.  From  (7)  and 
(8),  since  A  G  C<n  +  1)Xn,  the  matrix 


n  +  1 

L(A)  =  £ 


i-l 


where  R,  denotes  the  ith  row  of  the  matrix  A.  Similarly,  from  (9)— (11), 

n  + 1 

M( A)=  £  AiR?RjGC,n+,)x<n+,), 

i-l 

where  R,  denotes  the  ith  row  of  the  augmented  matrix  [  A  b]  G  C(n+  1,x<n+ 1). 
From  Lemma  3, 


max 

X  e  A 


detAf(A) 
det  L(A) 


1/2 


|det[A{f,...,A^1] 


n  +  1 


(39) 


£  |det[R«  „...,  fl''+1]  | 

■  l 


Since  the  determinant  of  the  conjugate  transpose  of  a  matrix  is  the  conjugate 
of  its  determinant,  we  have 


|det[R«...,Rf_1,R«1,...,R»+1]|  =  |detAj|, 

where  A,  is  defined  as  in  (31).  Expanding  det[ fif, . . . ,  R^+  j  ]  =  det[ A  fc] 
along  its  last  column  shows  that  the  right  hand  side  of  (39)  is  identically  the 
right  hand  side  of  (31).  That  the  left  hand  sides  are  also  equal  follows  from 
Theorem  3. 

The  unique  A*  G  A  for  which  the  maximum  (39)  occurs  is 


A*  = 


|det  A,  | 

n  +  1 

£  | det  A,  | 

»-i 


i  =  1,...,  n  + 1. 


Consequently,  the  Chebyshev  solution  z  *  of  Az  =  b  is  the  (unique)  solution 
of  the  A*  weighted  least  squares  problem  for  Az  =  b. 


-384- 


OVERDETERMINED  COMPLEX  EQUATIONS  75 

VI.  CONCLUDING  REMARKS 


The  general  Chebyshev  problem  (2)  can  be  posed  as  a  convex  optimiza¬ 
tion  problem  in  the  following  way. 

Problem.  Minimize 


(c;cefi,  :eCn) 


subject  to 


*>,  -  I 


a'izi 


<  e , 


i  =  1 . m. 


Its  convex  dual  can  be  developed  in  a  manner  similar  to,  but  more  general 
than,  that  pursued  in  [12]  for  Sylvester’s  problem.  This  approach  is  based  on 
Wolfe’s  dual  and  the  Kuhn-Tucker  conditions.  It  yields  Theorem  3  after 
somewhat  tedious,  but  insightful,  algebraic  manipulations.  In  particular,  the 
gradient  equation  of  the  Kuhn-Tucker  conditions  turns  out  to  be  the  system  of 
normal  equations  of  the  weighted  least  squares  problem. 

Theorem  2  can  be  extended  to  “weighted  Chebyshev"  solutions  of 
Az  =  b.  Let  w  =  (wt)  &  R"‘  be  any  nonnegative  vector,  and  define 


b'~  t  Oifj 


j- 1 


i/p 


p  >  1. 


The  proof  of  Theorem  1  can  be  trivially  modified  to  show  that  <j>p(z,  A;  to)  has 
a  saddle  point  (z*.  \*w)  on  C"  X  A  for  every  nonnegative  weight  vector  w. 
Consequently,  as  in  the  proof  of  Theorem  2,  z*  is  a  w  weighted  Chebyshev 
solution  of  Az  =  b  and  also  a  A^,  weighted  least  pth  power  solution  of 
Az  —  b.  Further  generalization  of  Theorem  2  to  nonlinear  systems  may  be 
possible  by  replacing,  in  the  definition  (1)  of  <f>p,  the  functions  f(z)=  \b,  - 
La^Zyl  with  more  general  real  valued  convex  functions  of  z. 


REFERENCES 

1  T.  S.  Motzkin  and  J.  L.  Walsh,  Polynomials  of  best  approximation  on  an  interval, 
Proc.  Nat.  Acad.  Sci.  U.S.A.  45:1523-1528  (1959). 


-385- 


76 


ROY  L.  STREIT 


2  T.  S.  Motzkin  and  ].  L.  Walsh,  Polynomials  of  best  approximation  on  a  real  finite 
point  set  I,  Trans.  Amer.  Math.  Soc.  91:231-245  (1959). 

3  C.  L.  Lawson,  Contributions  to  the  theory  of  linear  least  maximum  approxima¬ 
tions,  Ph.D.  Thesis,  UCLA,  1961. 

4  J.  R.  Rice,  The  Approximation  of  Functions,  Vol.  2,  Addison-Wesley,  1969. 

5  J.  R.  Rice  and  K.  H.  Usow,  The  Lawson  algorithm  and  extensions,  Math.  Comp. 
22:118-127  (1968). 

6  C.  J.  de  la  VaQee  Poussin,  Sur  la  methode  de  1’ approximation  minimum,  Ann. 
Soc.  Sci.  Bruxelles,  Seconde  Partie ,  Memoires  35:1-16(1911). 

7  T.  J.  Rivlin  and  H.  S.  Shapiro,  A  unified  approach  to  certain  problems  of 
approximation  and  minimization,  J.  Soc.  Indust.  Appl.  Math.  9:670-699  (1961). 

8  A.  K.  Cline,  Rate  of  convergence  of  Lawson’s  algorithm,  Math.  Comp.  26:167-176 
(1972). 

9  P.  J.  Davis,  Interpolation  and  Approximation,  Dover,  1975. 

10  H.  Kneser,  Sur  un  theoreme  fundamental  de  la  theorie  des  jeux,  C.  R.  Acad.  Sci. 
Paris  234:2418-2420  (1952). 

11  E.  G.  Gol’stein,  Theory  of  Convex  Programming,  Translations  of  Mathematical 
Monographs,  Vol.  36,  Amer.  Math.  Soc.,  Providence,  R.I.,  1972. 

12  D.  J.  Elzinga  and  D.  W.  Hearn,  The  minimal  covering  sphere  problem.  Manage¬ 
ment  Sci.  19:96-104  (1972). 

13  F.  R.  Cantmacher,  The  Theory  of  Matrices,  Vol.  I,  Chelsea,  1960. 

14  M.  Avriel,  Nonlinear  Programming,  Prentice-Hall,  1976. 

Received  IT  August  J983;  revised  6  March  1984 


-386- 


A  Note  On  The  Semi-Infinite  Programming  Approach 
To  Complex  Approximation 

R.  L.  Streit  and  A.  H.  Nuttall 


-387- 


MMfllMMIl  SOI  t  OMl'l  lAflON 
V(»I  l  Ml  4(1  M  MHI  K  !»•: 

•M'KII  l*#b*.  IVM.IS  V)‘*  -  M  I'1 


A  Note  on  the  Semi-Infinite  Programming 
Approach  to  Complex  Approximation 

By  Roy  L.  Streit  and  Albert  H.  Nuttall 

Abstract  Several  observations  are  made  about  a  recently  proposed  semi-infinite  programming 
(SIP)  method  for  computation  of  linear  Chebyshev  approximations  to  complex-valued  func¬ 
tions  A  particular  discretization  of  the  SIP  problem  is  shown  to  be  equivalent  to  replacing  the 
usual  absolute  value  of  a  complex  number  with  related  estimates,  resulting  in  a  class  of 
quasi-norms  on  the  complex  number  field  C.  and  consequently  a  class  of  quasi-norms  on  the 
space  C(Q)  consisting  of  all  continuous  functions  defined  on  Q  C  C.  Q  compact.  These 
quasi-norms  on  (\Q  i  are  estimates  of  the  l  T  norm  on  ('(())  and  are  useful  because  the  best 
approximation  problem  in  each  quasi-norm  can  be  solved  by  solving  (i)  an  ordinary  linear 
program  if  (>  is  finite  or  (»)  a  simplified  SIP  if  (J  is  not  finite 


Glashoff  and  Roleff  [1)  solve  a  semi-infinite  program  (SIP)  which  is  shown  to  be 
equivalent  to  the  linear  approximation  problem  for  functions  in  C(£).  where  C(Q) 
is  the  space  of  complex-valued  continuous  functions  on  a  compact  (and  not 
necessarily  finite)  subset  Q  of  the  complex  plane  C  and  is  equipped  with  the  uniform 
( l.y  )  norm 

(1)  \.f\\  =  max  |  /( -  )  |  ■ 

-t  C> 

Their  method  is  a  two-step  procedure:  the  first  step  applies  the  usual  simplex 
method  of  linear  programming  to  solve  a  discrete  approximation  of  the  SIP;  the 
second  step  uses  the  end  result  of  the  first  step  as  the  initial  starting  point  in  a 
New  ton-Raphson  iteration  to  solve  a  certain  system  of  nonlinear  algebraic  equations 
whose  solution  (if  feasible)  is  a  solution  to  the  linear  approximation  problem 
( Problem  1  below ).  The  purpose  of  this  note  is  to  make  some  observations  about  the 
linear  program  of  their  discrete  first  step,  which  closely  connects  its  solution  with  the 
solution  of  the  approximation  problem.  A  knowledge  of  the  SIP  definition  and 
solution  method  is  not  needed  to  understand  the  results  presented  here.  The 
interested  reader  is  referred  to  [1],  [2].  (3).  and  to  their  bibliographies.  We  point  out 
that  Theorems  1  and  2  were  first  proved  in  [4],  where  a  method  identical  to  the  first 
step  of  Glashoff-Roleffs  procedure  for  finite  Q  was  discovered  independently  of 
knowledge  of  (1]  and  of  semi-infinite  programming.  Readers  interested  in  practical 
examples  and  an  engineering  application  of  linear  complex  approximation  are  also 
referred  to  [4], 


Received  February  -4.  14X2 

14X0  Mathinuiiii  >  Subfccr  Classt/icanoii.  Primarv  6M)i5.  65E05.  65K05:  Secondary  30C30.  4IA50. 
3oax: 


099 


-389- 


600 


ROY  L  STREIT  AND  ALBERT  H  NUTT  ALL 


Let  fi,(z) . /!„(;)  and  f(z)  be  given  functions  in  C{Q).  For  any  set  of  complex 

parameters  a  =  {a, . a„},  define 

(2)  L(a;r)=  £  akhk(z). 

k=  1 

Problem  1.  Compute  a  set  of  complex  parameters  a *  =  {of . a*}  such  that,  for 

all  parameter  sets  a, 

(3)  ll/-L(fl*;z)||ae<||/-L(fl;r)||ac. 

We  set 

(4)  £„(/)  =  ll/-£(fl*;r)ll00. 

Let  p  >  2  be  a  positive  integer.  Define  the  angles 

6j  =  *(j  -  \)/p\  >=1,2 . 2 p, 

and  let  Sp  =  {0y}.  Define,  for  any  complex  number  z. 

(5) -  | '  |  =  max  { Re(  z )  cos  6  +  Im(  z )  sin  0  ) . 


It  may  be  readily  verified  that. 

(i)  |  z  \p  >  0  and  |  z  |p  =  0  if  and  only  if  z  =  0. 

(ii)  |  i  +  h'  |p  *s  |  r  [p  +  |  *»•  |p  for  all  complex  z  and  h\ 

(iii)  Given  a  complex,  |  az  \p  =  j  a  |  •  |  z  \p  for  all  z  if  and  only  if  arg  a  E  Sp. 

(iv)  For  a  and  a„  complex,  |  -z  |  =  |  z  |p,  lima  -0 1  <*„-  L  =  0.  and  lim„  . 

=  0.  * 


ar 


n  \p 


Thus  |  z  |p  is  not  a  norm  on  C  because  (iii)  is  not  sufficiently  strong;  however,  it  is 
a  quasi-norm  because  of  (i),  (ii),  and  (iv).  See  [5,pp.  30-32].  From  the  well-known 
identity 


(6) 


max  {Re(z)cos0  +  Im(z)sin0}, 

O<0<2» 


it  follows  that  |  z  |p  <  |  z  j .  In  addition  it  can  be  shown  that 
(7)  l*|,*l*M*(,«ec(^).  p>  2, 


for  all  complex  z.  To  see  (7),  it  is  helpful  to  visualize  the  set  of  all  z  in  C  such  that 
I  z  |p  =  1  as  an  equilateral  polygon  of  2p  sides  whose  inscribed  circle  is  the  unit  circle 
1*1=  >• 

It  is  easy  to  verify  that 


(8)  WfWp=na\\f(z)\p 

is  a  quasi-norm  on  C(Q)  for  each  integer p  >  2.  Further,  from  (7), 


<9>  H/H,<  ll/ll.  <ll/ll,sec(~). 

We  now  define  a  new  (partially)  discretized  version  of  Problem  1. 

Problem  2.  Fixp  3*  2.  Compute  a  set  of  complex  parameters  a**  =  {a,** . an**} 

such  that,  for  all  parameter  sets  a, 

(10)  II/-  L(a**;  z)||p  <||/—  L(a;  z)\\p. 


We  set 
(11) 


SEMI  1NTINITL  PROC-RAMMlNt.  IS  COMPI.1A  M’PROMMA  ri()N 


(iOl 


£„,(/)  =  H/-  L(u“:z)\\r. 


Theorem  1 .  E„p(  f )  *;  £„(  f )  «  Enp(  /  )sec( ). 

Proof.  We  have 

E„p(f)=\\f-  L(a*-:)\\p<\\f-  L(a”;  .-Ml,  <  II/-  L(a'-z)  II  x  =  £„(  /  ) 

«  II/- Ha" :  OH II  / -  /.(«-.  -- )ll , secj  ^  )  =  £„(  /  )sec(  j-  ) . 

Theorem  2.  £„(/)*£  II/-  £< a**;  r )ll ^  «  £„(  / )sec( ). 

Proof. 


£„(/)  =  II/-  L(«*;r)llx  ^  II/-  £(«•*:  .-)llx  ^  II/-  £(a**:  )ll,sec(  ) 

<  II  /  -  L( )ll , sec(  —  )  ^  £„<  /  )sec(  ^  ) . 

Corollary  1.  E„p(f)<  II/-  Ha **;  r)llx  «  £nr( / )sec( ). 

Corollary  2.  For  eat/i  p  >  2.  £„(  / )  =  0  //  and  only  if  Elip(  f  )  =  0. 


(12) 


Corollary  3.  //£„(  / )  =*  0. 

"/-  £(fl**;.-)llx  -  £„(/)  ,  *- 


0  *£ 


£.(/) 


8/>- 


;  +  ° [—*  •  r-*- 


and  the  upper  bound  is  independent  of  the  compact  set  Q.  n.  /.  and  the  functions 

A. . K- 


Proof.  The  indicated  ratio  is  bounded  above  by  the  constant  - 1  +  sec(  f-p ). 

It  is  not  necessary  that  the  domain  of  approximation  Q  be  a  subset  of  the  complex 

plane  C.  All  that  is  required  is  that  /  and  h , . hn  be  defined  on  a  common  domain 

Q  and  that  a  solution  to  Problem  1  exists. 

If  the  point  set  Q  is  not  finite,  then  both  Problems  1  and  2  can  be  readily 
transformed  into  linear  SIP’s  with  linear  objective  functions  and  infinitely  many 
linear  constraints  and  then  can  be  solved  in  the  manner  of  Glashoff  and  Roleff  [  1  ]. 
The  difference  is  that,  for  Problem  1,  there  is  one  constraint  for  each  element  of  the 
Cartesian  product  S  X  Q,  where  5  =  (if  in  C:  J  77 1  =  1);  whereas  for  Problem  2, 
there  is  only  one  constraint  for  each  element  of  Sp  x  Q,  where  Sp  =  (ij  in  C: 
if2'’  =  1}.  It  can  happen  in  certain  applications  that  the  bounds  proved  above  show 
that  Problem  2  is  adequate  for  some  fixed  p  >  2.  The  numerical  solution  procedures 
of  Glashoff  and  Roleff  may  then  be  appropriately,  and  potentially  significantly, 
simplified. 

On  the  other  hand,  if  Q  is  finite.  Problem  2  becomes  an  ordinary  linear  program, 
although  Problem  1  remains  an  SIP.  The  finite  Q  case  is  precisely  the  first  step  of 
the  Glashoff  and  Roleff  method  for  solving  Problem  1.  It  is  not  hard  to  see  that,  for 

Q  =  {2 . zm)  C  C,  Problem  2  may  be  reformulated  as  solving  an  overdetermined 

system  of  mp  real  linear  algebraic  equations  in  2  n  real  unknowns  in  the  usual 
Chebyshev  (lK)  norm.  Full  details  for  setting  up  the  linear  equations  can  be  found  in 
(4J.  (This  formulation  works  for  any  choice  of  7  s  {6k}  provided  only  that  SkG  T  if 


-391- 


602 


ROY  L  STRF.IT  AND  AI.HF.RT  H  NUTTALL 


and  only  if  0k  +  ir  G  T.)  This  real  system  may  he  written  in  the  following  block-par¬ 
titioned  form: 


R  cos  0^  +  Ssin0, 

RstnOf  -  S  cos  Of 

|:| 

u  cos  0,  -t  v  sin  Of 

(13) 

R  cos  02  +  S  sin  02 

R  sin  0,  —  S  cos  02 

u  cos  +  r  sin  0, 

R  cos  0„  +  S  sin  0. 

p  p  i 

R  sm  0  -  S  cos  6 
'  . 

«cos0/J  +  v  sin  0p 

where  we  define 

,v  =  [Re(a*)]  G  r  [lm(«<)]  G  R". 

u  =  [R e(/(r* ))]  G  R'\  r  -  [lm(  }\zk ))]  G  R'". 

and  the  two  m  X  n  matrices 

*  =  M  5  =  (v,*l  =[!■«(**(-,))]- 

Computer  CPU  time  and  storage  requirements  may  present  severe  practical  limita¬ 
tions  on  the  numerical  solution  of  (13)  in  certain  problems  of  genuine  interest.  See 
Streit  and  Nuttall  (4|  for  an  antenna  array  example  with  n  =  44.  p  —  8,  and 
m  =  501  which  required  1262  simplex  iterations  and  179  minutes  on  the  DEC  VAX 
1 1/780  to  solve  (13)  using  the  general  purpose  algorithm  [6],  If.  however,  the  special 
structure  of  (13)  is  exploited,  very  significant  reductions  in  both  time  and  storage 
requirements  are  possible;  see  (7). 

At  least  two  situations  might  arise  where  the  effective  use  of  the  structure  of  (13) 
in  its  solution  would  be  important.  First,  the  Glashoff-Roleff  method  Tor  any  given 
Q  requires  the  solution  (by  Newton-Raphson  or  any  other  workable  iterative 
method)  of  a  nonlinear  system  of  algebraic  equations.  If  the  initial  point  is  not 
sufficiently  good,  then  this  procedure  either  does  not  converge  or  it  converges  to  a 
nonfeasible  (hence,  incorrect)  point.  Since  initial  points  are  constructed  by  solving 
(13).  it  is  conceivable  that  very  large  systems  may  have  to  be  solved  (even  for  small 
n)  to  get  a  sufficiently  good  initial  point.  The  other  reason  for  studying  the  special 
structure  of  (13)  is  simply  that  n  may  be  very  large  to  begin  with.  In  the  kind  of 
applications  mentioned  in  |4],  it  would  not  be  at  all  unreasonable  to  find  n  >  100. 
Even  for  small  p,  the  system  (13)  is  then  very  large.  Either  case  presents  an 
interesting  problem  with  a  large  100%  dense  linear  program  having  special  structure, 
instead  of  the  more  typical  situation  of  a  large  sparse  linear  program  having 
relatively  little  special  structure  other  than  sparsity. 

Solving  the  overdetermined  system  (13).  while  requiring  nonnegative  residuals,  can 
have  interesting  geometrical  interpretations.  For  example,  take  p  =  2  so  that  0,  =  0 
and  02  =  w/2.  Thus,  the  2m  components  of  the  residual  vector  of  (13)  are  precisely 
the  real  and  imaginary  parts  of  the  complex  error  e{z)  =  f(z)  -  L[a:  z)  evaluated 
at  all  m  data  points.  Requiring  nonnegative  residuals  means  that  we  have  forced  the 
error  curve  e(z)  to  lie  entirely  in  the  first  quadrant  of  the  complex  plane.  Further¬ 
more,  it  is  easy  to  see  that  we  may  force  e(z)  to  lie  in  any  convex  wedge-shaped 
sector  of  the  complex  plane  by  making  appropriate  alternative  choices  of  the  two 


-392- 


SEMI-INFINITE  PROGRAMMING  IN  COMPLEX  APPROXIMATION 


603 


angles  0,and  d2.  Further  exploration  of  this  idea  shows  that  upper  and  lower  bounds 
for  the  error  Wn(  f )  defined  by 

Wn(f  )  =  min  max|/(z)  -  L(a\  r)| 

a€C"  :eQ 

subject  to:  f(z)  —  L(a;  z)  E  z  E  Q, 
can  be  obtained  in  terms  of  Wnp(  / )  defined  by 

KP(f)-m in  max|  f(z)  -  L(a:  z)\ 
aec"  ze.Q 

subject  to:  f(z)  -  L{a\  z)  E  ■\lf,  z  E  Q. 

This  technique  requires  an  appropriately  modified  set  of  angles  0, . $2p.  A 

solution  of  Wnp(  f )  can  then  be  found  numerically  by  computing  the  lx  solution  of 
an  overdetermined  system  of  the  form  (13)  with  the  additional  requirement  of 
nonnegative  residuals. 

Lemma.  Lei  Q  be  finite.  The  2  n  columns  of  the  coefficient  matrix  in  (13)  are  linearly 

dependent  (over  the  real  number  field)  if  and  only  if  the  n  functions  {h] . hn)  are 

linearly  dependent  on  Q  ( over  the  complex  number  field ). 

Proof.  There  exist  complex  numbers  ak  =  x*  +  />•*.  1  **  k  <  n.  not  all  zero, 
satisfying  112*  a*  A*  II*.  =  0  if  and  only  if  112*0*^*11,,  =  0.  This  latter  equation  is 
true  if  and  only  if 


max 

-e<? 


1 


t  =  i 


+  i'lm|  2 


=  0. 


which  holds  if  and  only  if,  for  each  z,  E  Q  and  0,  €  Sp  (l  <  p). 
|  2  xk  Re  M  )  ~  >*  1m  hk(z, )  J  cos  0, 


+ 


2  l"1  hk(z,)  +  yk  Re  hk(z,) 

k  =  i 


=  0. 


Rearranging  and  using  the  notation  of  ( 1 3)  gives 

(Rcos0y  +  Ssin0y)x  +  {Rsin0,  —  5cos0y)>’  =  0,  j  =  1 . p . 

which  means  that  the  columns  of  the  coefficient  matrix  in  (13)  are  linearly  depen¬ 
dent.  This  completes  the  proof. 

Theorem  3.  Let  Q  contain  m  <  2n  distinct  points,  let  the  functions  {/i, . h„)  be 

linearly  independent  on  Q,  and  let  a**  satisfy  (\0)  where p  >  2.  Then 

(14)  tl/-L(o**,r)||0C  =  £np(/)sec(^). 


Proof.  If  /  is  linearly  dependent  on  A, . h„,  then  En(f)  =  0  and.  from  Corollary 

2,  Enp(f )  =  0  and  (14)  is  trivially  true.  Suppose  then  that  /  is  linearly  independent 

of  A, . h„.  Let  a**  satisfy  (10).  Then  a **  is  a  Chebyshev  (lx)  solution  of  the 

system  (13),  and  the  maximum  residual  has  magnitude  Enp(f)>  0.  From  the 


-393- 


604 


ROY  L  STREIT  AND  ALBERT  H  NUTTALL 


preceding  Lemma,  the  rank  of  the  coefficient  matrix  in  (13)  is  2n.  Hence  there  exists 
[8,  p.  29]  another  solution  a  of  (13)  such  that 

II/- L(a-z)\\p  =  £„„(/), 

and  a  subset  of  at  least  2n  4-  1  of  the  mp  equations  (13)  has  residuals  equal  in 
magnitude  to  Enp(f).  (We  cannot  take  a  =  a**  in  general,  because  we  have  not 
assumed  that  the  coefficient  matrix  in  ( 1 3)  satisfies  the  Haar  condition  for  matrices.) 
Now  these  2 n  +  1  extremal  equations  must  be  distributed  among  the  m  <  2n  points 
of  Q.  Therefore,  at  least  one  point  z  in  Q  is  assigned  at  least  two  equations. 

Claim.  No  point  in  Q  can  be  assigned  more  than  two  extremal  equations.  Note 
first  that  the  residuals  of  the  p  equations  in  (13)  corresponding  to  a  given  point  z  in 
Q  are  precisely 

Tj  =  A  cos  Oj  4-  B  sin  6; ,  j  =  1 . p, 

where  A  and  B  are  the  real  and  imaginary  parts  of  f(z)  -  L(a\  z),  respectively.  Let 
K(z)  denote  the  set  of  indices  j  of  the  extremal  equations  assigned  to  the  point  z.  If 
K(z)  is  not  empty,  then  the  equations 

(15)  |/lcos0y  +  Bsin0;|=  £„,(/),  j  G  K(z), 

must  hold  simultaneously.  Since  E„p(f)  >  0,  it  is  clear  that,  if  K(z)  contains  more 
than  two  indices,  the  system  (15)  is  inconsistent.  This  proves  our  claim. 

Thus,  let  z  be  a  point  in  Q  which  is  assigned  two  extremal  equations.  Let 
K(z)  =  {j,  k}  with  j  ¥=  k.  Then  the  equations  (15)  imply 

|/l  +  /S  |  =£„,(/ )sec(*/2), 

where  <f>  is  the  smallest  angle  measured  between  the  four  angles  {0y,  8k.  6]  +  w.  6k  4- 
7r}.  Since  Theorem  1  cannot  be  violated,  we  must  have  <f>  =  n/p.  This  concludes  the 
proof. 

If  the  coefficient  matrix  in  (13)  satisfies  the  Haar  condition,  then  the  norm  (14)  is 
attained  for  at  least  t  =  min{2n  +1  —  m,  m)  distinct  points  z  in  Q.  In  this  case, 
a  =  a**,  so  every  point  having  two  of  the  In  +  1  extremal  equations  has  the 
residual  (14).  There  must  be  at  least  t  such  points,  considering  the  claim  established 
in  the  proof  of  Theorem  3. 

The  first  author  is  employed  by  the  Naval  Underwater  Systems  Center,  New 
London  Laboratory,  New  London,  CT  06320.  This  paper  was  written  during  his  stay 
as  a  Visiting  Scholar  in  the  Department  of  Operations  Research,  Stanford  Univer¬ 
sity,  Stanford,  CA  94305.  His  work  was  supported  jointly  by  the  Office  of  Naval 
Research  Project  RR014-07-01  and  by  The  Independent  Research  Program  of  the 
Naval  Underwater  Systems  Center. 

Naval  Underwater  Systems  Center 
New  London.  Connecticut  06320 


1  K  Glashoff  &  K.  Roleff,  “A  new  method  for  Chebyshev  approximation  of  complex-valued 
functions."  Math.  Comp  ,  v.  36,  1981,  pp  233-239. 

2  A.  Charnes,  W.  W.  Cooper  &  K.  O.  Kortanek.  “Duality.  Haar  programs  and  finite  sequence 
spaces."  Prtx  Nut.  Acad.  Set.  U  S  A.,  v.  48,  1962,  pp,  783-786 

3  S.-A.  Gustafson,  "On  semi-infinite  programming  in  numerical  analysis."  in  Semi-ln/mite  Program- 
nung  (R  Hettich,  Ed  ),  Lecture  Notes  in  Control  and  Information  Sciences.  Vol.  15.  Springer-Verlag, 
Berlin  and  New  York,  1979,  pp,  137-153. 


-394- 


SEMI-INFINITE  PROGRAMMING  IN  COMPLEX  APPROXIMATION 


605 


4  R  L  Streit  &  A.  H.  Nuttali..  "A  general  Chebvshey  complex  function  approximation  procedure 
and  an  application  to  beamforming."  J.  Acoust.  S<n  Amor  .  \  72.  14X2.  pp  1X1-140;  Also  NUSC 
Technical  Report  6403.  26  February  14X1  (Naval  Underwater  Systems  Center.  New  London.  Connecti¬ 
cut.  USA.) 

5  K.  Yosida.  Functional  Analysis.  2nd  ed  .  Spnnger-Verlag.  Berlin  and  New  York.  I46X 

6.  1.  Barrodale  &  C.  Phillips.  "Solution  of  an  overdeiermined  system  of  linear  equations  in  the 
Chcbyshcv  norm.”  Algorithm  445.  A  CM  Trans  Math  Software,  v  I.  1475.  pp  264-270 

7.  R  L  STREIT.  Numerical  Solutions  of  Systenis  of  Complex  Linear  Equations  with  Constraints  on  the 
Unknowns.  Stanford  University  Department  of  Operations  Research  SOL  Report  (To  appear.) 

X  Ci  A  Watson.  Approximation  Theory  and  Numerical  Methods.  Wiley.  New  York.  14X0 


-395- 


Solution  Of  Systems  Of  Complex  Linear  Equations 
In  The  /«,  Norm  With  Constraints  On  The  Unknowns 

R.  L.  Streit 


-397- 


t 


SIAM  J  Sci  Stat  Com  pi  t 
Vol  7.  No  1.  January  1986 


SOLUTION  OF  SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS  IN  THE  /„ 
NORM  WITH  CONSTRAINTS  ON  THE  UNKNOWNS* 

ROY  L.  STREITt 

Abstract.  An  algorithm  for  the  numerical  solution  of  general  systems  of  complex  linear  equations  in 
the  /„,  or  Chebyshev,  norm  is  presented.  The  objective  is  to  find  complex  values  for  the  unknowns  so  that 
the  maximum  magnitude  residual  of  the  system  is  a  minimum.  The  unknowns  are  required  to  satisfy  certain 
convex  constraints;  in  particular,  bounds  on  the  magnitudes  of  the  unknowns  are  imposed.  In  the  algorithm 
presented  here,  this  problem  is  replaced  by  a  linear  program  generated  in  such  a  way  that  the  relative  error 
between  its  solution  and  a  solution  of  the  original  problem  can  be  estimated.  The  maximum  relative  error 
can  easily  be  made  as  small  as  desired  by  selecting  an  appropriate  linear  program.  Order  of  magnitude 
improvements  in  both  computation  time  and  computer  storage  requirements  in  an  implementation  of  the 
simplex  algorithm  to  this  linear  program  are  presented.  Three  numerical  examples  are  included,  one  of 
which  is  a  complex  function  approximation  problem. 

Key  words,  complex  linear  equations,  Chebyshev  solution,  convex  constraints,  complex  approximation, 
semi-infinite  programming,  lx  norm 

I.  Introduction.  The  numerical  solution  of  general  systems  of  complex  linear 
equations  in  the  /x,  or  Chebyshev,  norm  is  a  mathematical  problem  that  arises  in 
several  applications.  The  objective  is  to  find  complex  values  for  the  unknowns  so  that 
the  maximum  magnitude  residual  of  the  system  of  equations  is  minimized.  The 
unknowns  are  not  allowed  to  assume  any  complex  value  whatever;  instead,  they  are 
required  to  satisfy  convex  constraints  of  the  form  that  can  occur  in  the  applications. 

Let  n,  m,  and  r  be  positive  integers.  Let  the  matrices  Ae  C"xm,  Be  C"*',  and  the 
row  vectors  f  e  Cm,  ge  Cr,  a  e  C”,  d  e  Rn,  and  ce  Rr  be  given.  The  vector  of  unknowns, 
z  €  C",  is  also  taken  to  be  a  row  vector.  (Row  instead  of  column  vectors  are  used  for 
notational  convenience  in  §2.)  The  problem  is  stated  as  follows. 

Problem. 


0)  min  ||  zA  -/||« 

subject  to: 

(2)  \z-a\Hd, 

(3)  \zB-g\Sc, 

where  the  Chebyshev  norm  ||  •  ||v  of  a  vector  is  the  maximum  modulus  of  its  components, 
and  where  the  modulus  |*|  of  a  vector  is  defined  to  be  the  vector  consisting  of  the 
modulii  of  its  components.  The  simple  constraints  (2)  are  essential  to  the  solution 
algorithm  presented  in  this  paper,  but  the  more  general  constraints  (3)  are  optional. 

It  is  assumed  that  c>0  and  d>  0.  Zero  components  of  c  and  d  are  equivalent  to 
equality  constraints  of  the  form  zH  =  e.  If  H  has  rank  q^n,  then  q  of  the  unknowns 
can  be  solved  for  explicitly  in  terms  of  the  remaining  unknowns  and  substituted  out 
of  the  problem.  The  reduced  problem  has  n-q  unknowns  and  the  same  mathematical 
form  as  (1  )-(3). 

In  this  paper  the  problem  ( 1  )-(3)  is  replaced  by  a  discretized  problem.  The 
discretized  problem  is  a  linear  program  which  is  generated  in  such  a  way  that  the 

*  Received  by  the  editors  July  5,  1983,  and  in  revised  form  October  I,  1984.  This  work  was  supported 
by  the  Office  of  Naval  Research  Project  RRO14-O7-01  and  by  the  Independent  Research  Program  of  the 
Naval  Underwater  Systems  Center.  This  paper  was  written  during  the  author’s  stay  as  a  visiting  scholar  in 
the  Department  of  Operations  Research,  Stanford  University,  Stanford,  CA  94305. 
t  Naval  Underwater  Systems  Center,  New  London,  Connecticut  06320. 

132 


-399- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


133 


relative  error  between  its  solution  and  a  solution  of  the  problem  ( 1  )-(3 )  can  be  estimated 
without  knowing  the  solution  of  either.  Furthermore,  the  maximum  relative  error  can 
easily  be  made  as  small  as  desired  by  selecting  an  appropriate  discretized  problem. 
See  Theorem  1  below. 

The  starting  point  for  the  discretized  problem  is  the  following  simple  observation. 
Let  u  be  a  complex  number  whose  real  and  imaginary  parts  are  u"  and  u',  respectively. 
It  is  easy  to  show  that 

(4)  |u|  =  max  (u“  cos  d  +  u'  sin  6). 

OS»<2ir 

Let  p  be  a  positive  integer,  and  let  D  =  {0,,  •  •  •  ,6P)  be  a  subset  of  the  interval  [0, 2ir). 
The  discretized  absolute  value  is  defined  by 

(5)  |u|D  =  max(u*  cos  6+u1  sin  0). 

Be  D 

Although  the  set  D  can  be  arbitrary,  it  is  convenient  to  assume  that  D  consists  of  the 
pth  roots  of  unity,  that  is, 

(6)  ek  -  (k  - \)2ir / p,  k  =  l,2,-,p, 

and  to  assume  that  p  =  2*,  kS 2.  It  follows  that 


(7)  |u|0s|u|£|u|0  sec (n/p). 

No  other  choice  of  D  can  give  a  tighter  upper  bound  in  (7).  The  requirement  that  p 
be  a  power  of  2  facilitates  computational  efficiencies  in  solving  the  optimization  problem 
(5)  and  is  discussed  in  §  2.  With  these  two  assumptions,  a  relative  accuracy  of  5 
significant  digits  in  (7)  requires  that  p  6  1024.  Other  properties  of  the  discretized 
absolute  value  are  given  in  [13]. 

The  discretized  version  of  (l)-(3)  is  developed  by  first  transforming  it  into  an 
optimization  problem  and  then  replacing  all  absolute  values  with  discretized  absolute 
values.  The  discretized  problem  can  be  written  in  the  following  manner. 

Discretized  problem. 


(8) 

subject  to: 
(9) 

00) 

(ID 


min  e 
tcllzec" 


zA,-j 

7  =  1,- 

•  • ,  m, 

zBj-[ 

■S'* 

a 

IIA 

** 

7  =  L  ‘ 

•  • ,  r. 

h  ~  °j\ 

VII 

Q 

7  =  1,- 

• ' ,  n. 

where  A,  and  Bt  denote  the  ;th  columns  of  the  matrices  A  and  B ’,  respectively.  It  is 
shown  in  §  2  that  the  discretized  problem  is  a  linear  program  in  2n  +  1  unknowns  with 
(m  +  rt  +  r)p  inequalities.  This  linear  program  cannot  be  assumed  to  be  sparse  since 
the  matrices  A  and  B  are  completely  dense  in  many  applications. 

The  discretized  problem  is  most  easily  solved  by  solving  its  dual.  The  revised 
simplex  method  applied  in  a  straightforward  manner  to  the  dual  problem  requires 
0((m  +  n  +  r)np)  storage  locations  and  0((m  +  r)np)  multiplications  per  simplex  iter¬ 
ation.  It  is  shown  in  this  paper  that  the  factor  of  p  can  be  eliminated  from  these 
estimates  by  successfully  exploiting  the  special  structure  of  the  dual.  These  economies 
leave  unaltered  the  sequence  of  basic  feasible  solutions  (vertices)  which  the  simplex 
method  generates  enroute  to  the  solution  of  the  dual.  Thus  the  impact  of  the  parameter 
p  is  limited  to  its  effect  on  the  total  number  of  simplex  iterations  required  to  reach 
the  solution.  As  will  be  seen,  p  affects  the  number  of  columns  in  the  dual  constraints 


-400- 


134 


ROY  L  STREIT 


and  not  the  number  of  rows,  so  the  growth  of  total  computational  effort  as  a  function 
of  p  is  not  great. 

A  Fortran  program  for  solving  the  discretized  problem  has  been  written  and 
documented  [11].  This  program  does  not  implement  all  of  the  economies  which  are 
possible  because  of  practical  considerations  discussed  in  §  2.  The  program  as  written 
requires  0((m  +  r)n)  +  O(p)  storage  locations  and  0((m  +  r)n)+  0((m  +  n  +  r)  log 2p) 
multiplications  per  simplex  iteration.  Also,  for  reasons  stated  in  §  3,  the  solution  of 
the  discretized  problem  for  large  values  of  p  is  approached  via  smaller  values  of  p. 
The  discretized  problem  for  p  =  4  is  first  solved  and  its  solution  used  as  an  advanced 
start  for  the  p  =  8  discretized  problem.  The  program  continues  doubling  p  at  each  stage 
until  a  specified  value  is  attained.  This  program  is  practical  for  modest  values  of  m,  n, 
and  r  for  large  values  of  p. 

The  following  theorem  proves  that  a  solution  of  the  discretized  problem  is  an 
approximate  solution  of  the  original  problem.  It  also  proves  that  the  maximum  relative 
error  in  this  approximate  solution  can  be  made  as  small  as  desired  by  appropriate 
choice  of  p.  Similar  results  for  the  unconstrained  problem  are  given  in  [12],  [13],  and 
[9]. 

Theorem  1.  Let  z*e  C"  solve  problem  (l)-(3),  and  let  e**e  R  and  z**e  C"  solve 
the  discretized  problem  (8)-(ll).  Then 

(12)  <r**  £  ||z*A -/|U  £  ||r*M -/|U  £  f**  sec  (ir/p) 

(13)  |z**B-g|£csec(7r/p), 

(14)  \z**  —  a|  £  d  sec  (ir/ p). 

Proof.  Since  \zf* - a,\DS  dj  for  each  f  it  follows  from  (7)  that 

|zf*~a,|  £  |z,**-q,|D  sec  (-nr/p)  £  d,  sec  (rr/p). 

This  proves  (14),  and  (13)  is  proved  the  same  way.  The  following  sequence  of 
inequalities  establishes  (12): 

c**  =  max|z**A>-/;|0 

£max|zM,-/|D 

£  max  |z*A, -f\  =  \\z*A  -/||« 

£  max  |z**A,  -f\  =  )jz*M  ~/|U 

£  max  |z**A;  -f\D  sec  (n/p) 

=  e**  sec  (7 r/p) 

where  the  max  in  all  cases  is  over  /'  =  1,  •  •  • ,  m.  This  concludes  the  proof. 

There  is  one  hazard  in  replacing  the  original  problem  with  the  discretized  problem. 
The  constraints  of  the  discretized  problem  have  a  larger  feasible  region  than  the  original 
constraints,  so  it  is  possible  that  the  discretized  problem  has  solutions  when  the  original 
problem  is  infeasible.  The  feasible  region  of  the  original  problem  is  approximated 
more  and  more  closely  as  p  is  increased,  so  the  discretized  problem  ultimately  fails  to 
have  a  solution  for  sufficiently  large  p  when  the  original  problem  is  infeasible.  If  the 
original  problem  is  in  some  sense  "nearly”  feasible,  but  in  reality  is  infeasible,  the 
discretized  problem  may  possess  solutions  for  very  large  values  of  p.  Thus  one  may 
be  deceived  in  certain  problems.  An  alternative  viewpoint  is  that  any  false  solution 
obtained  in  this  manner  to  infeasible  problems  actually  represents  a  “reasonable” 


-401- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


135 


solution  to  a  poorly  defined  problem.  Whether  or  not  this  view  is  sensible  depends  on 
the  application.  An  example  is  given  in  §  5. 

The  problem  (l)-(3)  has  a  mathematically  straightforward  solution  when  all  the 
quantities  are  real  valued  instead  of  complex.  The  real  valued  problem  is  exactly 
equivalent  to  a  linear  program  in  n  + 1  variables  with  2(  m  +  n  +  r)  inequality  constraints 
and  can  therefore  be  solved  in  a  finite  number  of  steps.  The  complex  valued  problem 
is  less  simple.  Eliminating  complex  arithmetic  by  substituting  in  the  real  and  imaginary 
parts  of  all  complex  quantities  yields,  after  squaring  the  constraints,  a  mathematical 
programming  problem  in  2n  + 1  variables  having  a  linear  objective  function  and 
m  +  n  +  r  quadratic  constraints.  No  method  is  available  for  solving  problems  of  this 
kind  in  a  finite  number  of  steps.  Since  it  is  a  convex  programming  problem  and  the 
functions  involved  have  easily  obtained  derivatives  of  all  orders,  many  different 
algorithms  are  potentially  applicable  for  its  approximate  solution.  The  only  reference 
[14]  known  to  the  author  which  explicitly  studies  the  constrained  complex  problem 
(l)-(3)  uses  a  feasible  directions  method.  At  each  step,  a  linear  program  is  solved  to 
determine  the  steepest  feasible  descent  direction,  a  line  search  determines  the  step 
length,  and  special  precautions  are  taken  to  prevent  zigzagging,  or  jamming.  A  conver¬ 
gence  proof  is  supplied. 

The  problem  (l)-(3)  can  be  viewed  as  a  semiinfinite  program  (SIP).  The  SIP 
formulation  of  the  unconstrained  problem,  that  is,  the  problem  consisting  of  only  the 
objective  function  (1),  has  been  studied  elsewhere  [12],  [131,  [4]  in  the  context  of 
complex  function  approximation  and  it  is  not  difficult  to  extend  that  formulation  to 
the  constrained  problem  (l)-(3).  None  of  these  references,  however,  show  that  the 
special  structure  of  the  discretized  problem  can  be  used  to  significantly  reduce  the 
computational  effort  in  its  solution.  Theorems  3,  4  and  5  of  the  next  section  are  also 
new  and  are  unique  to  the  complex  valued  problem.  The  relationship  between  SIP 
and  real  valued  approximation  is  presented  in  [3]. 

2.  Solution  of  the  discretized  problem.  An  algorithm  for  solving  the  discretized 
problem  for  fixed  p  is  discussed  in  this  section.  Attention  is  directed  to  special  structures 
of  the  discretized  problem  which  permit  order  of  magnitude  reductions  in  both  storage 
requirements  and  multiplications  per  simplex  iteration.  Several  useful  theoretical  results 
are  interspersed. 

It  is  first  established  that  the  discretized  problem  (8)-(  1 1 )  is  a  linear  program. 
Denote  the  real  and  imaginary  parts  of  any  quantity  u  by  u*  and  u',  respectively, 
whether  u  be  a  number,  a  row  or  column  vector,  or  a  matrix.  By  definition  (5), 


05)  |zA;-^|D  =  max[(zA>-X)R  cos  6  +  {zAj-fj)'  sin  0], 

so  the  m  inequalities  (9)  are  equivalent  to  the  system  of  mp  inequalities 
(16)  (zAj-fj)*  cos  0  +  (zA, -fj)'  sin  0§  e,  0eD,  j=l,--,m. 

Since 


(M,-y;)*  =  z*A;-z'A'-/f, 
it  is  convenient  to  write  (16)  in  the  form 
fA"  cos  0  +  A1  sin  0 


(za,-/-)'  =  z*a;+z'a;-/;, 


(17) 


[zVr] 


Ar  sin  0-  A1  cos  0 


§  [/*  cos  0  +/'  sin  0],  0  €  D, 


where  1„  e  Rm  is  a  row  vector  whose  components  all  equal  one.  The  inequalities  (10) 


-402- 


136 


ROY  L  STREIT 


and  (11)  are  treated  similarly,  so  the  discretized  problem  is  a  linear  program  in  2n  +  1 
variables  and  (m  +  n  +  r)p  inequalities.  The  linear  program  can  be  written  explicitly 
as  follows. 

Primal  problem. 

(18)  min  [zR  z'  f]{0„  0„  1]T 

(rVr  leR1"*1 

subject  to:  e  g  0  and,  for  each  0  e  D, 


Ar  cos  8  + A1  sin  8  BR  cos  8  +  B1  sin  0  I  cos  6 
[zR  z'  e]  Ar  sin  8-  A1  cos  8  BR  sin  6  -  B1  cos  8  /  sin  8 
(19)  L  -!«.  °r  - 

S [fR  cos  8  +/'  sin  0  c  +  gR  cos  0  +  g' sin  0  d  +  aR  cos  0  +  fl'  sin  8], 


where  /  denotes  the  n  x  n  identity  matrix  and  0k  denotes  a  zero  row  (or  column, 
depending  on  context)  of  length  k  g  1. 

The  primal  problem  is  solved  by  solving  its  dual  using  the  revised  simplex  method. 
The  simplex  (Lagrange)  multipliers  for  an  optimal  basic  solution  of  the  dual  solve  the 
primal,  assuming  the  primal  to  be  feasible.  The  dual  can  be  written  in  one  of  the 
standard  linear  programming  formats  by  explicitly  adding  a  slack  variable,  denoted 
Q ,  which  arises  naturally  in  this  problem. 

Dual  problem. 

(fR  cos  8k  +  /'  sin  8k)Sk 
+  (c  +  gR  cos  8k  +  g'  sin  8k)Tk  ■ 

+  ( d  +  a R  cos  8k  +  a 1  sin  8k)Wk 

subject  to:  S SO,  T gO,  WgO,  QgO,  and 


(20)  min  £ 

Sc  R"’r.Tc  R"r  k  =  I 
Wc  RnHp.Qc  R 


(21) 


Ar  cos  8k  +  A1  sin  8k  BR  cos  0k  +  B1  sin  8k  I  cos  8k 

'  Sk' 

'On' 

'0„' 

Ar  sin  8k  -  A1  cos  8k  BR  sin  0k-B'  cos  8k  /  sin  8k 

Tk 

+ 

0„ 

<?  = 

On 

lm  Or  On 

.W>. 

1 

1 

An  alternative  statement  of  the  dual  is  given  at  the  end  of  this  section. 

The  slack  variable  Q  plays  a  special  role,  as  seen  in  the  next  result. 

Theorem  2.  Let  the  matrices  S**g0,  IV**  g0,  7~**  g  0,  and  the  real  number 
Q**  g  0  denote  an  optimal  basic  feasible  solution  of  the  dual  problem  (20)-(21).  If 
Q**  >  0,  then  the  optimal  value  of  the  objective  function  in  the  primal  problem  (18)— (19) 
is  zero. 

Proof.  Let  [z**Rz**'e**]e  R denote  the  simplex  multipliers  of  the  optimal 
basic  solution  S**,  W**,  T**,  Q**.  Applying  the  complementary  slackness  theorem 
[8,  p.  77],  Q**  >  0  implies  e**  =  0  as  claimed. 

Except  for  the  slack  variable  Q,  every  basic  variable  of  the  dual  is  uniquely 
identified  by  specifying  the  matrix  to  which  it  belongs  together  with  its  location  (row 
and  column  number)  in  this  matrix.  The  matrix  names  S,  T,  and  W  correspond  to  the 
inequality  systems  (9),  ( 10),  and  (11),  respectively.  The  row  number  of  a  basic  variable 
identifies  the  particular  constraint  which  gives  rise  to  it.  For  example,  all  the  dual 
variables  in  row  q  of  matrix  T  are  eliminated  from  the  dual  problem  if  the  qth  inequality 
in  (10)  is  deleted  from  the  discretized  problem.  Similarly,  the  column  number  of  a 
basic  variable  identifies  the  angle  in  the  set  D  to  which  it  corresponds. 


-403- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


137 


The  revised  simplex  algorithm,  as  applied  to  the  dual,  is  defined  in  general  terms 
as  follows: 

Step  1.  Determine  an  initial  basic  feasible  solution  of  the  dual  problem 

Step  2.  Compute  the  simplex  multipliers  corresponding  to  the  current  basic  feas¬ 
ible  solution. 

Step  3.  Determine  the  incoming  variable  by  selecting  the  variable  having  the  most 
negative  reduced  cost  coefficient;  terminate  if  all  reduced  cost  coefficients 
are  nonnegative — the  primal  problem  is  solved  by  the  current  simplex 
multipliers. 

Step  4.  Compute  the  column  of  the  incoming  variable  in  terms  of  the  current  basis. 

Step  5.  Determine  the  outgoing  basic  variable  by  a  ratio  test;  terminate  if  the  dual 
objective  function  is  unbounded  below — the  primal  problem  is  infeasible. 

Step  6.  Update  the  basis  inverse  and  current  basic  feasible  solution  by  pivoting, 
and  return  to  Step  2. 

The  special  structure  of  the  dual  problem  has  its  strongest  influence  on  Steps  1,  3,  and 
4.  These  effects  are  outlined  next.  More  detailed  aspects  of  the  algorithm  are  postponed 
to  §  4. 

The  dual  problem  is  already  in  canonical  form  for  initiating  the  second  phase  of 
the  simplex  algorithm.  In  other  words,  Step  1  is  trivial  because  an  identity  matrix  of 
order  2 n  +  1  can  be  assembled  from  the  columns  of  the  coefficient  matrix  of  (21).  One 
readily  available  column  is  the  column  corresponding  to  the  slack  variable  Q.  The 
remaining  2 n  columns  correspond  to  dual  variables  which  are  the  components  of  two 
particular  W  columns.  From  (6),  0,  =0  so  that  cos  0,  =  1  and  sin  0,  =0.  Hence  one  of 
the  W  columns  can  be  taken  to  be  IV,.  Similarly,  the  other  is  WI+p/4  since  0,+p/4=  rr/2. 
The  initial  basic  feasible  solution  is  therefore 

(22)  W,=  W1+p/4  =  0m  <?=1. 

The  simplex  multipliers  corresponding  to  (22)  are  derived  in  a  special  way  later  in 
this  section. 

The  initial  basic  feasible  solution  (22)  is  highly  degenerate.  As  discussed  in  [2], 
it  is  in  problems  of  this  general  kind  that  cycling  in  the  simplex  algorithm  is  occasionally 
observed  in  practice.  Such  cycling  was  observed  in  an  example  given  in  this  paper. 
However,  a  modification  of  the  tie-breaking  rule  in  the  ratio  test  for  the  outgoing  basic 
variable,  together  with  “preferential  treatment”  of  certain  incoming  variables,  seems 
to  avoid  the  difficulty.  Further  discussion  of  cycling  in  the  dual  problem  is  postponed 
to  §  4. 

The  cost  coefficients  and  the  columns  of  any  dual  variable  can  be  found  by 
inspecting  (20)-(21).  They  are  given  in  a  complex  arithmetic  format  in  Table  1. 
Explicitly  computing  and  storing  all  (m  +  n  +  r)p  columns  of  the  dual  problem  is 
unnecessary  (and  impractical)  since  the  column  of  any  dual  variable  can  be  constructed 
directly  from  the  matrices  A  and  B.  Not  counting  the  necessary  sine  and  cosine,  this 
requires  only  n  complex  multiplications  and  reduces  the  storage  from  (2n  +  l) 
(m  +  n  +  r)p  words  to  only  2n(m  +  n  +  r)  words.  The  columns  of  the  dual  variables 
Wjk  are  merely  columns  of  the  identity  matrix  /,  which  need  not  be  explicitly  stored. 
Therefore  the  total  storage  necessary  for  constructing  the  column  of  any  dual  variable 
is  only  2n(m  +  r)  words.  In  practice,  it  is  convenient  to  compute  the  cosines  and  sines 
once  and  for  all  to  reduce  the  computational  overhead.  If  this  is  done,  as  it  is  in  [11], 
the  storage  requirements  are  2n(m  +  r)  +  2p. 


-404- 


138 


ROY  L.  STREIT 


Table  1 

Dual  variable  cost  coefficients  and  columns  in  complex  format. 


dual  variable 

cost  coefficient 

column,  in  R2"*' 

sik 

(/>*-“•)* 

[M,e-'*>)*  -(*,*-•*)'  l]T 

T,k 

(c/  +  g/e",*‘)* 

[<B, e-'*)"  -{Bte-*‘)'  0]T 

(d,  +  u,e-“*)* 

[(/,e''*‘)K  -</,*-**>'  0]T 

An  efficient  method  of  computing  the  smallest  reduced  cost  coefficient  in  Step  3 
of  the  revised  simplex  algorithm  is  now  discussed.  This  method  is  particularly  interesting 
because  the  columns  of  the  dual  variables  are  not  explicitly  needed.  The  only  data 
required  are  the  original  complex  matrices  A  and  B  and  the  sines  and  cosines  of  the 
angles  in  D.  Let  A  be  any  real  row  vector  of  simplex  multipliers  for  the  dual  problem; 
thus,  A  is  of  length  2n  +  l.  The  vector  A  defines  a  complex  row  vector  re  C"  and  a 
real  number  e  by  the  identification 

(23)  \=[zRz‘  -e]eR2n+l. 

The  reduced  cost  of  the  dual  variable  Sjk  is  the  cost  coefficient  of  SJk  minus  the  product 
of  A  with  the  column  of  Sjk.  Using  (23)  and  Table  1  gives 

Cs  =  (fj  e-i*‘)R  ~[r*  2'  -f][<V-‘)*  HA,*-"')'  1]T 

(24) 

=  e  -[(zAj-fj) 

so  the  minimum  reduced  cost  coefficient  of  the  p  variables  in  row  j  of  S  is 

(25)  C’s=  m\n  C’s=e-\z A,- fjlo,  j=  1,2,  ■  ■  ■ ,  m. 

I  SkSp 

The  smallest  reduced  cost  coefficient  of  all  the  dual  variables  of  5  is  then 

(26)  Cs  =  min  C's  =  e -  max  IrA-flo. 

isysm  lS)5m  ’  ’ 

Similarly,  the  minimum  reduced  cost  coefficients  over  all  the  dual  variables  of  T  and 
W  are 

(27)  CT=  min  {cj  -\zBj  -  gj\D) 
and 

(28)  Cw  =  min  (dj  -\zj  -aj\D), 

1  SjSn 

respectively.  The  smallest  reduced  cost  of  all  the  variables  of  the  dual  problem  is 
min  {Cs,  Cw,  Cr}. 

The  smallest  of  the  three  quantities  Cs,  Cw,  and  CT  and  the  index  j  for  which 
the  minimum  value  is  attained  determine  the  row  number  and  the  correct  matrix  name 
of  the  incoming  dual  variable.  The  column  number  is  determined  by  the  angle  0ke  D 
giving  the  largest  projection  (i.e.,  the  discretized  absolute  value)  at  the  minimal  index 
j.  The  angle  0k  may  not  be  unique  because  of  possible  ties  in  (5),  so  a  tie-breaking 
rule  called  the  minimal  clockwise  index  (MCI)  rule  is  used  to  determine  unambiguously 
the  incoming  dual  variable. 

The  MCI  rule  is  defined  for  all  ue  C.  Let  uD  be  the  set  of  those  angles  0e  D  for 
which  the  maximum  in  (5)  is  attained.  There  are  three  cases.  First,  if  uD  has  precisely 
one  element,  the  MCI  of  u  is  defined  to  be  the  index  of  that  element.  Second,  if  uD 
has  precisely  two  elements,  say  0k  and  0j,  and  neither  k  or  j  equals  p,  then  the  MCI 
of  u  is  defined  to  be  min  { k,j}  \  on  the  other  hand,  if  either  k  =  p  or  j  =  p,  then  the 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


139 


MCI  of  u  is  taken  to  be  p.  Third,  if  u„  has  more  than  two  elements,  then  it  must  be 
that  u  =  0  and  uD=  D,  so  the  MCI  of  u  is  defined  to  be  1. 

The  computation  of  the  discretized  absolute  value  and  corresponding  MCI  must 
be  undertaken  for  m  +  n  +  r  comp  ex  numbers  to  compute  (26)-(28)  during  each 
iteration  of  the  simplex  algorithm  in  Step  3.  A  brute  force  approach  using  the  definition 
(5)  requires  2 p  real  multiplications  for  each  complex  number.  Such  an  approach  is 
inefficient  and  does  not  exploit  the  special  form  of  the  set  D.  For  p  =  4,  it  is  clear  that 
comparison  tests  alone  suffice  to  solve  this  subproblem.  For  p  S  8,  comparison  tests 
and  at  most  2  log2  p  -  5  real  multiplications  are  sufficient.  To  see  this,  first  determine 
the  quadrant  of  the  complex  plane  in  which  the  given  number  lies,  and  determine 
whether  it  lies  above  or  below  the  4S°  line  bisecting  the  quadrant.  This  can  be  done 
using  comparison  tests  only.  Now  that  the  “half-quadrant”  in  which  the  number  lies 
is  known,  its  projections  onto  the  bounding  rays  of  this  half-quadrant  can  be  computed 
in  this  special  case  using  only  one  multiplication.  If  p  =  8,  a  final  comparison  test  ends 
the  problem.  If  p  g  16,  then  the  larger  of  the  two  projections  reveals  the  “quarter- 
quadrant"  in  which  the  number  must  lie.  The  projection  onto  one  of  the  bounding 
rays  of  this  quarter-quadrant  is  already  known;  so  it  is  only  necessary  to  compute  the 
projection  onto  the  other  bounding  ray.  This  requires  2  real  multiplications.  If  p  =  16, 
a  final  comparison  test  ends  the  problem.  If  p£  32,  we  continue  as  before.  Counting 
the  total  possible  number  of  steps  proves  the  claim.  This  bisection  method  works 
because  of  the  special  form  of  the  set  D. 

In  principle  the  discretized  absolute  value  and  corresponding  MCI  can  be  found 
with  computational  effort  independent  of  p.  The  argument  (phase)  of  the  given  complex 
number  can  be  computed,  essentially  as  an  inverse  tangent,  and  from  it  the  MCI  can 
be  found  using  comparison  tests.  Whenever  the  inverse  tangent  computation  requires 
fewer  than  2  log2  p  -  5  multiplications,  it  is  more  efficient  than  the  bisection  method 
described  above.  For  pS  1024  the  bisection  method  is  more  efficient,  and  it  is  used  in 

til]- 

The  number  of  real  multiplications  required  to  complete  Step  3  using  these  methods 
is  significantly  less  than  that  required  in  the  usual  approach.  The  straightforward 
method  requires  the  computation  of  (m  +  n  +  r)p-(2n+  1)  real  inner  products  of  length 
2n.  Taking  account  of  the  simple  form  of  the  W  columns  gives  a  total  of  approximately 

(29)  (2p-4)[n(m  +  r)+  l]  +  4n(m  +  r-  n  -  1/2) 

real  multiplications.  The  special  methods  discussed  above  require  m  +  n  +  r  complex 
inner  products  of  length  n  followed  by  the  computation  of  the  discretized  absolute 
value  and  corresponding  MCI  for  each  inner  product.  Counting  one  complex  multipli¬ 
cation  as  four  real  multiplications  and  considering  the  special  form  of  the  W  columns 
gives  a  total  of 

(30)  4n(m  +  r)  +  (m  +  n  +  r)Njj 

real  multiplications,  where  Nn  is  the  number  of  multiplications  needed  to  compute 
one  discretized  absolute  value  and  corresponding  MCI.  If  the  inverse  tangent  method 
is  used,  N0  is  a  constant  independent  of  p.  If  the  bisection  method  is  used  Nn  = 
2  log2  p  -  5  for  p  a  8,  =  0  for  p  =  4.  The  special  methods  are  clearly  better  when 

pg  4  and  m>  n.  In  the  derivation  of  both  (29)  and  (30)  it  was  assumed  that  the  last 
row  of  (21)  in  the  dual  problem  was  specially  treated  to  avoid  multiplications  by  1 
and  0. 

The  simplex  multipliers  \(0'eR2n corresponding  to  the  initial  basic  feasible 
solution  (22)  are  now  derived.  Multiplying  the  initial  basis  inverse  on  the  left  by  the 


-406- 


140 


ROY  L.  STREIT 


row  vector  containing  the  cost  coefficients  of  the  initial  basic  variables  gives  the  row 
vector  A<0'.  The  initial  basis  inverse  is  the  identity  matrix,  the  cost  coefficients  of  the 
basic  W  variables  (22)  are  given  in  Table  1,  and  the  cost  coefficient  of  the  slack  variable 
Q  is  0.  Consequently, 

A<0,  =  [d  +  (a)*  d  +  (-ia)*  0]  =  [d  +  a*  d  +  a'  0 ]eK2"+l. 

The  definition  (23)  thus  gives 

(31)  z,0)  =  a  +  dei‘n,*'Jl,  e,O)  =  0. 

From  the  proof  of  Theorem  2  it  can  be  seen  that  e  =  0  for  as  long  as  the  slack  variable 
remains  in  the  basis  and  is  positive. 

The  matrices  S,  W,  and  T  are  sparse  because  basic  feasible  solutions  of  the  dual 
consist  of  only  2n  + 1  nonnegative  variables.  Furthermore,  no  row  of  S,  W,  and  T  can 
contain  more  than  two  basic  variables  as  the  next  theorem  shows. 

Theorem  3.  No  basic  feasible  solution  of  the  dual  problem  (20)-(21)  can  have 
more  than  two  basic  variables  in  any  one  row  of  W  or  T.  If  a  basic  feasible  solution  of 
the  dual  problem  has  corresponding  simplex  multipliers  with  e>0,  then  S  cannot  have 
more  than  two  basic  variables  in  any  one  row. 

Proof.  The  first  statement  is  proved  for  the  matrix  T ;  the  proof  for  IV  is  a  special 
case.  Consider  the  jth  row  of  T.  Suppose  a  basic  feasible  solution  has  three  basic 
variables  Tja,  TfP,  and  Tly  with  a,  0,  and  y  being  distinct.  Then  the  reduced  costs  for 
all  three  variables  must  be  zero.  A  result  analogous  to  (24)  was  used  to  prove  (27); 
using  it  here  gives 

(32)  C’t  =  0  =  c,  -  [(zBj  -  gj)  e-<**]*,  q  =  a,P,y. 

Thus  the  single  complex  number  zBj  -  gj  has  the  same  projection,  namely  Cj,  in  three 
distinct  directions.  This  is  impossible  unless  zBj  -gj  =  Cj  -  0,  in  contradiction  to  the 
assumption  that  c,  >  0.  This  establishes  the  first  statement.  The  second  statement  is 
proved  in  the  same  way,  by  using  (24)  itself. 

The  following  theorem  relates  knowledge  of  an  optimal  basis  of  the  dual  to 
“observable”  quantities  in  the  primal  problem.  The  results  of  the  theorem  depend  on 
the  names,  but  not  the  actual  numerical  values,  of  the  optimal  dual  basis.  In  addition 
it  seems  to  indicate  that  the  upper  bound  (12)  in  Theorem  1  will  often  be  attained  in 
practice. 

Theorem  4.  Let  e**  e  R  and  z**  e  C"  denote  the  simplex  multipliers  in  the  form 
(23)  of  an  optimal  basis  for  the  dual  problem  (20)-(21),  and  suppose  that  e**>0.  If  the 
jth  row  of  one  of  the  matrices  S,  W,  or  T  contains  two  optimal  basic  variables  in  columns 
a  and  f)  with  p  g  a  >  0  g  1,  then  either  a-  (3  =  l  or  a-(i  =  p- 1.  i/a  -  / 3  =  1,  then 

(33)  r**  A,  -f  =  e**  sec  (  it  Ip )  exp  [i(  2/3  -  1  )ir/p], 
or 

(34)  z**Bj-gj  =  Cj  sec  (ir/p)  exp  [i(20  -  1  )n/p], 
or 

(35)  zf* -a,  =  d,  sec  (i r/p)  exp  [i( 20  -\)ir/p], 

according  to  whether  the  jth  row  is  a  row  of  S,  T,  or  W,  respectively.  Replacing  f)  with  p 
in  (33)-(35)  gives  the  equations  corresponding  to  the  alternative  case  a  -  =  p- 1. 

Proof.  Only  the  5  matrix  case  is  treated  since  the  other  two  cases  are  similar.  The 
two  basic  variables  involved  are  S)a  and  S)P.  Assume  that  p  S  a  >  0  S  l.  The  reduced 


-407- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


141 


costs  C's  and  Cf  must  be  0,  so  (24)  gives  the  two  equations 

(36)  e**  =  [(***  A, -/,)»"*•  1*.  e”  =  [(z**A,-ft)e 

Any  complex  number  having  identical  projections  in  two  directions  is  uniquely  defined 
in  both  magnitude  and  phase.  If  8a  differs  from  80  by  n  radians,  the  system  (36) 
implies  that  e**  =  0,  contrary  to  assumption.  Thus  (36)  implies  that  z**A,-ft  = 
e**  sec  (<f>/2)  >  0,  where  4>  =  min  {8a  -  80,  2n-8a  +  80).  By  Theorem  1,  <t>  =  n/p ,  so 
that  either  8a  -  80  =  n/p  or  8o-80  =  ir(2  p  -  1  )/p.  From  (6),  either  a-/3  =  lora-/3  = 
p-  1.  For  a  - (3  =  1,  solving  the  system  (36)  for  the  phase  of  z**A,-f,  gives  (33).  The 
case  a-  fi  =  p- 1  is  handled  in  the  same  way.  This  completes  the  proof. 

Theorem  4  is  useful  in  practice.  Computed  optimal  dual  solutions  can  be  inspected 
to  verify  that  optimal  basic  variables  occurring  in  the  same  row  are  in  fact  “paired” 
in  the  manner  described.  If  they  are  not,  then  numerical  round-off  errors  have  adversely 
affected  the  computed  solution. 

Theorem  5.  Lei  e **  and  z**  be  as  in  Theorem  4.  If  the  jth  row  of  one  of  the 
matrices  S,  W,  or  T  contains  an  optimal  basic  variable  in  column  a,  1  §  a  =  p,  then 

e**S  |  z**At  -f\  =  c**  sec  n/p,  8a  - 1 r/p  &  arg  ( z**A,  -f)S0a  +  -nip, 
or 

c,  S I  z **  Bj  -  g,|  S  c,  sec  n/p ,  8a- n/p %  arg  ( z**  B,-g))S6„  +  n/p , 

or 

dt  S  | z**  -  a,\ S  d,  sec  n/p ,  8a  -  n/p  S  arg  (zf*  ~a,)S8a  +  n/p , 

according  to  whether  the  jth  row  is  a  row  of  S,  T,  or  W,  respectively. 

Proof  The  proof  is  closely  related  to  the  method  of  proof  of  Theorem  4  and  is 
not  presented. 

This  section  is  concluded  with  a  concise  statement  of  the  dual  problem  using 
complex  arithmetic  notation. 

Dual  problem:  complex  formal. 

min[(/S  +  gr+aW)e-iD]*+  £  ( CT,  +  DW ,) 

S.T 

W%Q 

subject  to:  5 SO,  T g 0,  W g 0,  Q g 0,  and 

{AS  +  BT  +  W)  e-'D  =  Oe  C",  <?+IIS,k  =  l. 

>• i ‘-i 

We  have  used  e’,D  to  denote  a  complex  column  vector  of  length  n  whose  Jcth  component 
is  exp(-<0();  other  notation  is  unchanged  from  (20)-(21). 

3.  Solution  of  the  discretized  problem  for  large  p.  One  reason  to  solve  large  p 
discretized  problems  is  that  applications  requiring  5  or  more  significant  digits  of  relative 
accuracy  in  the  optimal  value  of  the  objective  function  and/or  in  constraint  satisfaction 
need  to  take  p  g  1024;  see  Theorem  1.  Another  reason  to  solve  large  p  problems  is 
that  their  solutions  furnish  starting  points  for  other  methods  which  potentially  provide 
greater  accuracy.  For  instance,  the  problem  (l)-(3)  can  be  rewritten  as  a  semiinfinite 
program,  or  SIP,  and  an  interesting  algorithm  [5],  [6]  for  solving  a  class  of  SIP’s  can 
be  utilized.  This  method  sets  up  an  appropriate  nonlinear  system  of  algebraic  equations 
which  are  solved  using  the  Newton-Raphson  method  (or  other  iterative  method);  a 


-408- 


142 


ROY  L  STRE1T 


feasible  solution  of  the  nonlinear  system  is  a  solution  of  the  SIP.  The  starting  point 
of  the  Newton- Raphson  iteration  is  taken  to  be  »he  solution  of  a  discretized  problem. 
Large  p  discretized  problems  will  have  to  be  solved  whenever  very  good  starting  points 
are  needed  to  ensure  convergence  of  the  Newton- Raphson  iteration. 

There  is,  however,  a  practical  limit  to  how  large  p  may  be  taken  in  many  problems. 
A  discretized  problem  is  numerically  unstable  for  sufficiently  large  p  if  its  optimal 
solution  has,  for  every  p,  two  basic  dual  variables  in  at  least  one  row  or  S,  W,  or  T. 
The  columns  of  two  such  basic  dual  variables  are  less  distinguishable  numerically  as 
p  increases  (see  Table  1).  Consequently,  the  basis  matrix  is  more  ill-conditioned  for 
large  p.  Only  those  problems  which  never,  for  any  p,  have  more  than  one  optimal  basic 
variable  per  row  of  S,  W,  or  T  can  escape  numerical  ill-conditioning  from  this  cause. 
Such  problems  seem  to  be  uncommon. 

The  algorithm  we  suggest  for  solving  the  discretized  problem  for  large  p  begins 
by  solving  the  smallest  dual  problem,  that  is,  the  dual  problem  with  p  =  4.  Next,  the 
p  =  8  dual  problem  is  solved  using  the  optimal  basis  for  the  p  =  4  dual  to  start  the 
simplex  algorithm.  The  p  =  16  dual  is  then  solved  starting  at  the  optimal  basis  for  the 
p  =  8  dual,  and  so  forth.  The  algorithm  is  always  well-defined  because  basic  feasible 
dual  solutions  for  a  given  p  are  also  basic  feasible  dual  solutions  for  all  larger  values 
of  p  because  the  sets  D  are  nested  for  p  =  4,  8, 16, 32,  •  •  •  .  By  doubling  p  at  each  stage 
beginning  with  p  =  4,  this  algorithm  avoids  bases  associated  with  numerical  instability 
from  the  discretization  process  until  p  becomes  very  large.  Difficulties  caused  by 
ill-conditioning  in  the  complex  equations  themselves  cannot,  of  course,  be  avoided. 

One  advantage  of  this  algorithm  is  that  the  optimal  basis  for  each  intermediate 
value  of  p  can  be  easily  inspected  using  Theorems  4  and  5  to  determine  if  numerical 
round-off  errors  are  significant.  If  sufficient  error  is  present,  the  algorithm  can  be 
terminated  early,  or  alternatively,  the  basis  can  be  reinverted  before  continuing  to  the 
next  value  of  p. 

The  primary  drawback  of  the  algorithm  is  that  more  simplex  iterations  are  usually 
required  to  reach  the  final  optimal  dual  basis  by  proceeding  via  smaller  values  of  p 
than  by  solving  the  full  dual  problem  all  at  once.  This  difficulty  does  not  seem  to  be 
significant  in  practice  and,  in  any  event,  can  be  partially  overcome  by  skipping  more 
rapidly  through  the  available  values  of  p.  It  is  also  possible  to  begin  the  algorithm 
with  a  larger  initial  value  of  p;  that  is,  p>  4. 

Optimal  solutions  of  the  primal  discretized  problem  converge  only  linearly  with 
increasing  p ,  while  the  optimal  values  e**  converge  quadratically.  It  would  be  useful 
to  be  able  to  extrapolate  the  primal  solutions  to  obtain  a  better  solution  of  the  original 
problem  (l)-(3).  Richardson  extrapolation  (see,  e.g.,  [7],  [10])  worked  very  well  for 
Examples  1-3  in  §  5  for  sufficiently  large  p,  but  failed  in  other  problems.  It  is  apparently 
successful  only  when  (a)  the  row  numbers  of  the  optimal  dual  basic  variables  of  the 
discretized  problems  identify  the  optimal  active  constraints  of  the  original  problem, 
and  (b)  the  optimal  values  of  the  discretized  problems  equal  the  optimum  value  of 
the  original  problem.  The  first  requirement  can  be  met  by  taking  p  sufficiently  large. 
The  second  requirement  imposes  more  severe  limitations  on  the  practical  utility  of 
Richardson  extrapolation. 

4.  Details  of  the  revised  simplex  algorithm.  Computer  codes  which  treat  complex 
matrices  and  vectors  by  separating  them  into  their  real  and  imaginary  parts  cause 
thrashing  on  virtual  memory  systems.  Therefore  the  solution  vector  z  of  the  primal 
problem  is  best  stored  as  a  complex  vector  and  the  simplex  multipliers  reordered  to 
reflect  the  storage  of  z.  The  rows  of  the  dual  problem  should  also  be  reordered.  The 


-409- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


143 


computer  code  therefore  visualizes  the  dual  problem  rows  in  the  following  order:  {1, 
n  +  1, 2,  n  +  2,  •  ■  •  ,  n  -  1,  2n  - 1,  n,  2n,  2n  +  1}.  These  numbers  denote  the  row  numbers 
in  the  original  system  (21).  The  reordered  system  is  much  easier  to  work  with  in 
FORTRAN  than  the  original  system.  With  the  rows  of  the  dual  problem  in  this  order 
the  reduced  cost  calculations  can  be  coded  in  FORTRAN  just  as  they  are  written  in 
(26)-(28),  provided  the  initial  data  of  the  problem  are  typed  COMPLEX. 

The  name  of  a  dual  variable  is  a  triplet  i/j/k  of  positive  integers,  where: 

/  =  1, 2,  or  3  according  to  whether  it  is  an  S,  W ,  or  T  variable, 
j  =  constraint  number,  from  (9)-(  11), 
k  =  projection  number  of  the  angle  in  the  set  D,  1  S  k  S  p. 

The  middle  name  j  has  different  ranges  depending  on  the  value  of  the  first  name  i. 
These  triplets  are  ordered  lexicographically. 

The  most  negative  reduced  cost  determines  the  entering  basic  variable  in  the 
simplex  algorithm.  Ties  for  the  most  negative  reduced  cost  are  broken  by  choosing  the 
variable  with  the  least  lexicographically  ordered  name.  Because  the  highly  degenerate 
initial  starting  point  (22)  can  cause  cycling  in  the  simplex  algorithm,  there  is  one 
exception  to  the  least  name  rule  in  case  of  ties  for  the  entering  variable.  As  long  as 
the  slack  variable  Q  remains  in  the  basis,  the  only  entering  variables  permitted  are  5 
variables  with  negative  reduced  costs.  If  S  variables  with  negative  reduced  costs  do 
not  exist,  then  the  entering  variable  is  permitted  to  be  a  W  or  a  T  variable  and  ties 
are  resolved  by  the  least  name  rule.  Thus,  S  variables  are  given  priority  for  entering 
the  basis  only  for  as  long  as  the  slack  Q  is  in  the  basis.  Once  Q  is  removed  from  the 
basis  it  never  enters  again,  and  exceptions  to  the  tie  breaking  rule  cease. 

The  outgoing  basic  variable  is  determined  by  the  usual  ratio  test.  If  the  least  ratio 
is  attained  by  more  than  one  variable,  the  variable  having  the  largest  magnitude  pivot 
leaves  the  basis.  If  more  than  one  variable  has  the  same  magnitude  pivot,  then  the 
variable  with  least  index  is  selected.  Because  of  degeneracy  and  cycling,  there  is  one 
exception  to  this  tie-breaking  rule  for  the  exiting  variable.  So  long  as  the  slack  Q 
remains  in  the  basis,  only  W  variables  are  permitted  to  exit.  This  rule  makes  sense 
only  when  a  W  variable  is  involved  in  the  tie;  if  no  such  W  variable  exists,  the 
exception  is  not  invoked.  If  more  than  one  W  variable  is  involved  in  the  tie,  then  the 
one  having  the  largest  magnitude  pivot  with  the  least  index  is  selected  to  exit.  Just  as 
for  the  entering  variable,  this  exception  ceases  once  the  slack  Q  leaves  the  basis. 

Cycling  in  the  simplex  algorithm  has  not  been  observed  with  these  modifications 
to  the  usual  tie  breaking  rules  for  entering  and  exiting  variables.  However,  if  these 
modifications  are  not  used,  cycling  may  well  occur.  Example  3  of  §  5  below  cycled 
(with  a  cycle  of  length  19)  without  these  modifications.  It  is  possible  that  cycling  in 
this  particular  example  is  an  artifact  of  finite  precision  arithmetic. 

A  nonzero  tolerance  is  necessary  when  testing  for  the  most  negative  reduced  cost 
and  for  possible  divisors  in  the  ratio  test.  This  number  must  not  be  too  small  and  it 
must  somehow  be  dependent  on  the  scale  of  the  problem  data.  The  number  used  in 
[11]  is  the  product  of  the  unit  round-off  error  of  the  host  computer  with  the  sum  of 
the  absolute  values  of  the  incoming  column  (i.e.,  its  I,  norm).  This  number  is  used  for 
both  reduced  cost  and  pivot  tolerance  tests. 

Besides  the  usual  termination  criteria  in  the  simplex  algorithm,  the  pricing  method 
implicit  in  (26)-(28)  yields  a  novel  way  to  terminate  the  algorithm.  The  pricing  method 
computes  the  most  negative  reduced  cost  by  indirectly  examining  all  reduced  costs, 
not  just  the  reduced  costs  of  the  nonbasic  variables.  Hence  it  can  happen  that  the 
entering  and  the  exiting  variables  are  identical  because  of  numerical  round-off  errors. 


-410- 


144 


ROY  L.  STREIT 


This  event  seems  to  signal  that  no  further  improvement  in  the  solution  is  numerically 
possible.  Solutions  returned  by  terminating  the  algorithm  whenever  this  “self-cycling” 
occurs  appear  to  be  satisfactory. 

The  Fortran  code  [11]  was  developed  to  test  the  methods  described  for  solving 
the  dual  problem.  It  holds  an  explicit  basis  inverse  and  performs  pivoting  to  update 
the  inverse  in  each  simplex  iteration.  Pivoting  is  known  to  be  numerically  unstable, 
but  easily  programmed.  To  forestall  numerical  difficulties  the  inverse  is  held  in  double 
precision,  although  a  double  precision  inverse  is  not  a  satisfactory  substitute  for  a 
numerically  stable  technique.  Updating  the  QR  factorization  of  the  basis  is  preferable. 
Nonetheless  the  explicit  inverse  code  gives  good  performance  in  many  problems. 

5.  Examples.  Example  1  is  taken  directly  from  [14,  p.  249].  Let  n  =  2,  m  =  5,  r  =  2, 
and  define  the  matrices 


1 

.5  2 

»-[? 

2' 

3 

-1  1. 

L2 

-4. 

"  =  g  =  [0,0],  c  =  [v/2,^2],  d  =  [10,10], 

/=  [-1  +  i,  -1  +  i,  .Si,  0,  -1  +  /]. 

Only  the  vector  /is  complex.  The  exact  solution  is  z,  =  (-1  +  i)/ 2,  z:  =  0,  and  e  =V2/2. 
The  constraints  of  type  (2)  are  not  part  of  the  original  problem  given  in  [14].  They 
have  been  added  because  their  discretizations  provide  the  initial  dual  basis. 

Table  2  gives  the  solutions  of  the  discretized  primal  problem  for  selected  values 
of  p.  The  optimal  value  of  e  for  p  =  8  is  the  optimal  value  of  e  for  all  p g 8.  For  pg 8, 
the  accuracy  of  the  primal  solutions  depends  solely  upon  the  discretization  errors  since 
the  optimal  e  does  not  change.  Table  3  gives  the  optimal  basic  solutions  of  the  dual 
problems  for  the  same  values  of  p.  The  active  constraints  do  not  change  for  p  £  8, 
except  for  their  6  names.  Hence  the  active  constraints  at  the  optimum  of  the  original 
nonlinear  problem  (l)-(3)  have  been  identified.  The  fourth  and  fifth  basic  variables 
are  “paired”  in  an  obvious  way;  this  behavior  is  explained  by  Theorem  4. 

All  optimal  dual  solutions  are  degenerate,  or  very  nearly  so.  It  turns  out  that  the 
“degenerate  parts”  of  the  optimal  dual  solutions  approximately  doubled  as  p  is  doubled, 
especially  when  pg64.  Assuming  the  trend  continues  indefinitely,  the  optimal  dual 
solution  will  eventually  look  nondegenerate.  This  trend  is  probably  an  artifact  of  the 
numerical  ill-conditioning  inherent  in  the  discretization  process. 

The  conditions  mentioned  at  the  end  of  §  3  for  success  using  Richardson  extrapola¬ 
tion  seem  to  be  met  for  p  §  8.  Since  convergence  of  the  z  vectors  is  linear,  multiply 
the  p  =  32  vector  by  two  and  subtract  the  p  =  16  vector  to  get 

[zf ,  z[ ,  zf ,  z'7]  =  [-.500964,  .499036,  .40  x  10'10,  .36  x  10“'°]. 

One  step  of  this  extrapolation  gives  values  nearly  as  accurate  as  the  values  correspond¬ 
ing  to  p  =  2048. 

Numerical  computations  for  this  and  the  next  two  examples  were  performed  on 
a  DEC  10.  It  has  a  double  precision  unit  round-of!  error  of  approximately  2xl0-19. 

Example  1  can  be  made  infeasible  by  adjoining  one  constraint  of  type  (3).  Replace 
B,  g,  and  c  in  (37)  with 

B  =  [j  j],  g*  =  [0,0, 7  —  4/],  c  =  [72,^2,29/4], 

The  discretized  primal  problem  is  feasible  for  p  =  4  and  8;  for  pg  16,  it  is  infeasible. 


-411- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


145 


Table  2 

Solutions  of  the  primal  problem.  Example  I. 


p 

4 

8 

16 

32 

2048 

-.588760 

-.292893 

-.400544 

-.450754 

-.499233 

z\ 

.588760 

.707107 

.599456 

.549246 

.500767 

z? 

.59x10-' 

-.82X10-* 

-.28x10-’ 

-.12x10"* 

-.15x10-" 

z[ 

-.59x10-' 

-.82x10-* 

-.12x10-’ 

-.78x10-'° 

-.15x10-" 

e 

.4112399 

.7071068 

.7071068 

.7071068 

.7071068 

total  iterations 

10 

14 

17 

20 

38 

Table  3 

Solutions  of  the  dual  problem.  Example  I. 


P 

4 

8 

16 

32 

2048 

1/1/8 

1/1/15 

1/1/29 

1/1/1793 

1/2/4 

1/2/1 

1/2/16 

1/2/30 

1/2/1794 

basis  names 

1/3/3 

3/1/4 

3/1/6 

3/1/12 

3/1/768 

3/2/2 

3/2/3 

3/2/6 

3/2/12 

3/2/768 

3/2/4 

3/2/7 

3/2/13 

3/2/769 

.714286 

1.000000 

1. 000000 

1 .000000 

1. 000000 

.000000 

.50x10-'* 

.66x10-'* 

.11  xlO-" 

.58  xlO'17 

basis  values 

.285714 

-.50x10-” 

.91  x  10-'* 

.24x10-'* 

.19  x  10-'7 

.000000 

.11  XlO"" 

.22x10-" 

.43x10-" 

.27x10" 

.214286 

.500000 

.500000 

.500000 

.500000 

This  illustrates  the  remark  made  in  §  1  that  some  discretized  problems  have  feasible 
solutions  when  the  original  problem  is  actually  infeasible. 

Example  2  is  the  same  as  Example  1,  except  that  constraints  of  type  (2)  are 
tightened  so  that  they  are  active  at  the  solution.  Replace  the  vector  d  in  (37)  with 
d  =  [.4,  .4],  The  exact  solution  of  this  problem  is  e  =  -Ji -  .4,  z,  =  (- 1  +  i)'fl/ 5,  and 

3.7(Vi - 1)- (431<>02- 19032071)^ ^  _  20ss46w3- 
300-1200V2 

r?  ■  [r  (j!  't!)T  *  -  °9333M68- 

Tables  4  and  5  give,  respectively,  the  solutions  of  the  primal  and  dual  discretized 
problem  for  selected  values  of  p.  The  obvious  “pair”  of  basic  variables  in  Table  5  is 
explained  by  Theorem  4.  The  conditions  for  success  using  Richardson  extrapolation 
seem  to  be  met  for  p  S  32.  Extrapolation  of  the  p  =  32  and  p  =  64  vectors  in  Table  4 
performed  as  in  Example  1  gives 

[zf,r!,z2",z;]  =  [-.282776,  .282911,  -.092665,  -.209334], 

which  is  comparable  to  the  values  corresponding  to  p  =  2048. 

Example  3  is  taken  from  [12]  and  is  an  unconstrained  complex  function  approxima¬ 
tion  problem;  that  is,  constraints  of  type  (2)-(3)  are  absent.  The  101  columns  of  the 


-412- 


146 


ROY  L  STREIT 


Table  4 

Solutions  of  the  primal  problem.  Example  2. 


p 

4 

8 

16 

32 

64 

2048 

-.400000 

-.345442 

-.293794 

-.310700 

-.296738 

-.283277 

*1 

.400000 

.220244 

.271891 

.254985 

.268948 

.282409 

*? 

.153553 

-.026274 

-.076571 

-.061857 

-.077261 

-.092805 

-.153553 

-.243431 

-.217608 

-.214390 

-.211862 

-.208960 

£ 

.600000 

1.014214 

1.014214 

1.014214 

1.014214 

1.014214 

total  iterations 

7 

11 

13 

16 

18 

33 

T \BLE  5 

Solutions  of  the  dual  problem.  Example  2. 

P 

4 

8 

16 

32 

64 

7J48 

1/2/1 

1/2/8 

1/2/15 

1/2/29 

1/2/57 

1/2/1793 

1/2/4 

1/3/6 

1/3/12 

1/3/22 

1/3/43 

1/3/1346 

basis  names 

2/1/3 

2/1/4 

2/1/7 

2/1/13 

2/1/25 

2/1/769 

3/2/2 

3/2/3 

3/2/5 

2/1/14 

2/1/26 

2/1/770 

3/2/3 

3/2/4 

3/2/6 

3/2/10 

3/2/19 

3/2/558 

1. 

1. 

1. 

1. 

1. 

1. 

0. 

0. 

0. 

0. 

0. 

0. 

basis  values 

1. 

1. 

1. 

1. 

1. 

1. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

0. 

matrix  Ae  c3xI01  are 

Aj  =  [  1  exp  (i(j—  1)tt/400)  exp  (i2(j -  l)jr/400)]r,  j  =  1, 2,  •  •  • ,  101, 

while  the  components  of  /eC101  are  f,  =  exp  ( *3(>  —  l)7r/400),  j  =  1, 2,  •  •  • ,  101.  In 
other  words,  the  complex  valued  function  e'3*  is  approximated  by  complex  linear 
combinations  of  the  three  functions  1,  ea,  and  eilx  over  101  equispaced  points  on  the 
x-interval  [0,  7r/4],  Bounds  of  type  (2)  must  be  specified,  so  we  take  a  =  [0,0,0], 
d  =  [10, 10, 10].  These  constraints  are  not  active  at  the  optimal  solution. 

It  can  be  verified  that  the  exact  solution  of  Example  3  is  r,  =  a,  exp  (i'3tt/8), 
z2  =  a2  exp  (i5n/4),  z3  =  a}  exp  (iir/ 8),  where 

a,  =  as. 961 57056080646, 

a2—  b  —  2(b  —  a2)/(  1  -  a2)  =  2.8122548927058, 

a3  =  a(l  -  2b  +  a2)/(  1  -  a2)  s  2.8477590650226, 

a  =  A  cos  (rr/16)  +  (l  -  A)  cos  (w/8), 

b  =  A  cos  (7r/8)  +  (l  -  A)  cos  (ir/4), 

c  =  A  cos  (3ir/16)  +  (l  -  A )  cos  (3w/8), 

A  =sin  (7r/8)/(sin  ( -»r/ 16)-*- sin  (»r/8)), 

e  =  (1  —  Ctt,4  hot  2  —  aa3),/2  =  .014706309694449. 


-413- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


147 


Tables  6  and  7  give,  respectively,  the  solutions  of  the  primal  and  dual  discretized 
problems  for  selected  values  of  p.  The  obvious  “pairing”  of  the  basic  variables  in  Table 
7  is  explained  by  Theorem  4.  Note  also  that  the  row  numbers  of  the  optimal  dual  basic 
variables  are  different  for  p  =  1024  and  p  =  64  (probably  because  the  dual  does  not 
have  a  unique  solution).  Nonetheless,  Richardson  extrapolation  works  when  applied 
to  the  cases  p  =  32  and  p  =  64.  As  in  the  previous  two  examples,  one  extrapolation 
step  gives 

[zf,zf,z?,z^,z?,z3']  =  [. 367954,  .888319,  -1.988481,  -1.988481,  2.630930,  1.089767] 

which,  in  turn,  gives  the  values  a ,  = . 96 151,  a2  =  2. 81214,  a3  =  2.84770.  The  case  p  =  1 024 
used  directly  gives  the  values  a,  =  .96236,  a2  =  2.81376,  a}  =  2.84855,  which  are  clearly 
inferior  to  the  extrapolated  values. 


Table  6 

Solutions  of  the  primal  problem.  Example  3. 


p 

8 

16 

32 

64 

1024 

’f 

.378265 

.377950 

.377718 

.372836 

.368281 

w 

.913212 

.912452 

.911891 

.900105 

.889108 

-2.026895 

-2.024346 

-2.022845 

-2.005663 

-1.989632 

z! 

-2.026895 

-2.024346 

-2.022845 

-2.005663 

-1.989632 

2.654494 

2.654624 

2.654502 

2.642716 

2.631719 

„/ 

-3 

1.099528 

1.099581 

1.099531 

1.094649 

1.090094 

e 

.0141560 

.0145244 

.0147063 

.0147063 

.0147063 

total  iterations 

20 

25 

33 

36 

48 

Table  7 

Solutions  of  the  dual  problem.  Example  3. 

P 

8 

16 

32 

64 

1024 

1/1/8 

1/1/14 

RBI 

1/1/896 

1/25/4 

1/1/15 

1/1/57 

1/1/897 

1/28/5 

1/26/7 

1/26/13 

1/26/26 

1/26/417 

basis  names 

1/74/8 

1/27/8 

1/26/14 

1/26/27 

1/76/993 

1/77/1 

1/75/16 

1/76/32 

1/76/63 

1/76/994 

1/101/5 

1/76/1 

1/101/17 

1/101/33 

1/101/513 

1/101/6 

1/101/9 

1/101/18 

1/101/34 

1/101/514 

.163234 

.004365 

.000000 

.000000 

.000000 

.244029 

.160548 

.168829 

.168829 

.168829 

.091325 

.170912 

.000000 

.000000 

.331171 

basis  values 

.070677 

.164573 

.331171 

.331171 

.33)170 

.263126 

.173184 

.331171 

.331171 

.000000 

.157491 

.162060 

.168829 

.168829 

.168829 

.010118 

.164358 

.000000 

.000000 

.000000 

Another  unconstrained  complex  function  approximation  problem  in  [12]  is  moder¬ 
ately  large  and  completely  dense.  The  motivating  background  and  engineering  applica¬ 
tion  of  this  problem  are  fully  discussed  in  [12].  The  501  columns  of  the  matrix 
Ae  C44  *  501  are 

A,  =  [exp  (ikiX,)  exp  (ik2x,)  exp  (ik^x,)]T 

-exp(ik<5Xj)[l  1  l]r,  j=  1,2,  •••,501 


-414- 


148 


ROY  L.  STREIT 


where  1  =  1c,  <  k2<  •••  </£*»<  k4<  =  49  are  the  distinct  integers  between  1  and  49, 
excluding  the  integers  7,17,21,  and  29,  and  where  x,  =  u0+  (j  ~  1)(1  -  uo)/250,  j  = 
1,2,  •••,501  with  u0  =  .05381 17.  The  components  of  /eC501  are  f,  =  exp  (ik45x,), 
7  =  1,--*,  501.  This  example  lacks  constraints  of  type  (2)-(3).  The  discretized  problem 
for  p  =  16  was  solved  on  a  DEC  VAX  11/780  in  1350  simplex  iterations.  Total  CPU 
time  was  25  minutes  and  .7  million  page  faults  were  incurred.  Only  80,000  words  of 
storage  were  '.leeded.  In  contrast,  the  algorithm  proposed  in  [12]  (which  utilizes  the 
algorithm  [1]  as  a  subroutine)  solved  this  problem  on  the  same  VAX  in  1270  simplex 
iterations,  requiring  179  minutes  of  CPU  time  and  incurring  11  million  page  faults. 
Over  360,000  words  of  storage  were  needed.  The  difference  in  the  number  of  simplex 
iterations  is  explained  as  follows.  The  algorithm  [12]  solves  the  full  problem  for  p  =  16, 
while  the  algorithm  developed  in  this  paper  solves  the  p  =  4  problem  and  the  p  =  8 
problem  before  solving  the  p=  16  problem.  This  indirect  route  to  the  full  problem 
solution  is  less  efficient  in  this  example  than  solving  the  p  =  16  problem  immediately. 

6.  Concluding  remarks.  A  solution  of  the  discretized  problem  for  sufficiently  large 
p  identifies  the  constraints  active  at  a  solution  of  the  original  problem  ( 1  )-(3).  Deleting 
inactive  constraints  from  the  original  problem  yields  an  equality  constrained  nonlinear 
optimization  problem.  Lagrange's  method  gives  rise  to  a  nonlinear  system  of  algebraic 
equations  in  the  optimum  value  e,  the  solution  vector  z,  and  the  multipliers  A.  Iterative 
methods  for  the  solution  of  this  system  can  be  started  from  an  initial  point  (f,  z.  A) 
provided  by  a  discretized  problem  solution.  Safeguarded  Newton- Raphson  iteration 
may  be  highly  effective  for  solving  this  system,  especially  if  advantage  is  taken  of  the 
system's  special  form  (i.e.,  for  A  given,  the  vector  z  can  be  found  by  solving  a  system 
of  linear  equations).  A  possible  limitation  of  this  approach  is  that  very  large  values  of 
p  might  be  necessary  in  order  to  identify  the  right  active  set.  The  examples  of  the 
previous  section,  however,  indicate  that  the  optimal  active  set  is  found  for  relatively 
small  values  of  p.  Specifically,  in  Examples  1 , 2,  and  3,  the  correct  active  sets  (determined 
from  the  optimal  dual  basis  names  in  Tables  3,  5,  and  7)  first  appear  when  p  is  8,  8, 
and  32,  respectively. 

Certain  kinds  of  domain  and  range  constraints  can  be  adjoined  to  the  discretized 
problem  (8 )-( 11)  with  only  minor  extension  of  the  algorithm  proposed  here.  Let  the 
matrix  H  e  (7"’“’,  and  the  row  vectors  ee  Cq,  <£e  Rq,  and  he  Rq  be  given.  Then  the 
constraints 

(38)  ((zHj  -  ej)  exp  (-i<t>j))R  &  hp  j=  1,  •••,<? 

are  linear  in  zR  and  z\  and  so  can  be  added  to  the  discretized  problem.  The  constraints 
(10)  and  (11)  are  instances  of  (38);  however,  (38)  can  impose  constraints  not  possible 
with  (10)  and  (11).  For  instance,  if  q  =  l,  the  constraint  that  the  complex  number 
zH,-e ,  must  lie  in  the  right  half  complex  plane  is  equivalent  to  ((z/f, -e,) 
exp  (-in))R  SO.  Furthermore,  ifqgl  and  the  columns  H  and  e  are  identical  to  their 
first  columns,  then  the  number  zH,  -  e,  can  be  confined  to  any  closed  convex  polygonal 
region  (bounded  or  unbounded)  in  the  complex  plane  by  appropriate  choices  of  <j>,  h, 
and  q. 

When  complex  function  approximation  on  an  arc  or  domain  boundary  in  the 
complex  plane  gives  rise  to  the  problem  (l)-(3),  then  an  implicit  natural  ordering  of 
the  columns  of  the  matrix  A  exists.  The  ordering  is  inherited  from  the  ordering  of  the 
discrete  points  along  the  arc,  and  it  makes  possible  clever  strategies  of  both  multiple 
and  partial  pricing  which  may  significantly  reduce  overall  computation  time  when  m 
and  n  are  large.  Effective  partial  pricing  schemes  require  far  fewer  evaluations  of  the 


-415- 


SYSTEMS  OF  COMPLEX  LINEAR  EQUATIONS 


149 


vector-matrix  products  zA ,  in  (26)  without  significantly  increasing  the  total  number  of 
iterations.  Effective  multiple  pricing  schemes  decrease  the  number  of  iterations  by 
increasing  the  change  in  e  in  each  iteration.  Both  multiple  and  partial  pricing  can  be 
implemented  simultaneously. 

One  particularly  interesting  problem  is  complex  function  approximation  on  the 
mth  roots  of  unity.  When  m  ^  n  and  when  m  is  a  power  of  2,  the  fast  Fourier  transform 
(FFT)  algorithm  can  be  used  to  compute  the  m  products  zAj  in  2m  log2  m  operations. 
The  straightforward  products  zA,  require  mn  operations.  Therefore,  the  FFT  method 
is  more  efficient  whenever  2  log3  m  S  nS  m. 

It  has  been  assumed  throughout  this  paper  that  the  unknown  vector  z  must  lie  in 
C".  In  some  applications  it  is  necessary  to  restrict  z  to  Rn,  while  still  retaining  complex 
matrices  A  and  B  in  original  problem  (l)-(3).  Setting  z‘  =  0  in  the  discretized  problem 
is  equivalent  to  eliminating  n  of  the  In  + 1  rows  of  the  dual  problem  constraints  (21). 
The  techniques  developed  for  the  dual  problem  simplify  when  applied  to  this  modified 
problem.  Consequently  the  modified  dual  problem  is  smaller  and  easier  to  solve. 
Examples  and  a  Fortran  program  for  this  problem  are  given  in  [11]. 


REFERENCES 

[1]  1.  Barrodale  and  C.  Phillips,  Solution  of  an  overdetermined  system  of  linear  equations  in  the 

Chebyshev  norm,  ACM  Algorithm  495,  ACM  Trans.  Math.  Software,  I  (1975),  pp.  264-270. 

[2]  S.  i.  Gass,  Comments  on  the  possibility  of  cycling  with  the  simplex  method.  Letter  to  The  Editor,  Oper. 

Res  ,  27  (1979),  pp.  848-852. 

[3]  K.  Glashoff  and  S.-A.  Gustafson,  Linear  Optimization  and  Approximation,  Springer- Verlag,  New 

York,  1983. 

[4]  K.  Glashoff  and  K.  Roleff,  A  new  method  for  Chebyshev  approximation  of  complex-valued  functions. 

Math.  Comp.,  36  (1981 ),  pp.  233-239. 

[5]  S.-A.  Gustafson,  Nonlinear  systems  in  semi-infinite  programming,  in  Numerical  Solution  of  Nonlinear 

Algebraic  Systems,  G.  B.  Byrnes  and  C.  A.  Hall,  eds..  Academic  Press,  New  York,  1973,  pp.  63-99. 

[6]  S.-A.  Gustafson  and  K.  Kortanek,  Numerical  treatment  of  a  class  of  semiinfinite  programming 

problems.  Naval  Research  Log.  Quart.,  20  (1973),  pp.  477-504. 

[7]  D.  C.  Joyce,  Survey  of  extrapolation  processes  in  numerical  analysis,  SIAM  Rev.,  13  (1971),  pp.  435-488. 

[8]  D.  G.  Luenberger,  Introduction  to  Linear  and  Nonlinear  Programming,  Addison-Wesley,  Reading, 

MA,  1973. 

[9]  G.  Opfer,  Solving  complex  approximation  problems  by  semiinfinite-finite  optimization  techniques:  a  study 

on  convergence,  Numer.  Math.,  39  (1982),  pp.  411-420. 

[10]  A.  Ralston,  A  First  Course  in  Numerical  Analysis,  McGraw-Hill,  New  York,  1965. 

[11]  R.  L.  Streit,  An  algorithm  for  the  solution  of  systems  of  complex  linear  equations  in  the  /„  norm  with 

constraints  on  the  unknowns,  ACM  Trans.  Math.  Software,  II,  3  (1985). 

[12]  R.  L.  Streit  and  A.  H.  Nuttall,  Linear  Chebyshev  complex  function  approximation  and  an  application 

to  beamforming,  J.  Acoust.  Soc.  Amer.,  72(1)  (1982),  pp.  181-190.  (Also  in  Naval  Underwater 
Systems  Center  Report  6403,  26  February  1981.) 

[13]  R.  L.  Streit  and  A.  H.  Nuttall,  A  note  on  the  semi-infinite  programming  approach  to  complex 

approximation.  Math.  Comp.,  40(1983),  pp.  599-605. 

[14]  S.  I.  Zukhovitskiy  and  L.  I.  Avdeyeva,  Linear  and  Convex  Programming,  W.  B.  Saunders, 

Philadelphia,  1966.  (Original  Russian  edition,  Moscow,  1964.) 


-416- 


ALGORITHM  635: 

An  Algorithm  For  The  Solution 
Of  Systems  Of  Complex  Linear  Equations 
In  The  Norm  With  Constraints  On  The  Unknowns 

R.  L.  Streit 


-417- 


ALGORITHM  635 

An  Algorithm  for  the  Solution  of  Systems  of 
Complex  Linear  Equations  in  the  /»  Norm 
with  Constraints  on  the  Unknowns 

ROY  L.  STREIT 

Naval  Underwater  Systems  Center 


Categories  and  Subject  Descriptors:  G.1.2  {Numerical  Analysis]:  Approximation— minimax  ap¬ 
proximation  and  algorithms,  G.1.3  [Numerical  Analysis]:  Numerical  Linear  Algebra— linear  systems 
(direct  and  iterative  methods),  G.1.6  (Numerical  Analysis]:  Optimization— linear  programming 

General  Terms:  Algorithms,  Complex  Systems 

Additional  Key  Words  and  Phrases:  complex  linear  equations,  Chebyshev  solution,  complex  approx¬ 
imation,  constraints,  semi-infinite  programming 


1.  DESCRIPTION 

The  set  of  FORTRAN  subroutines  given  here  is  an  implementation  of  the 
algorithm  [1]  for  computing  /m,  or  Chebyshev,  solutions  to  complex  systems  of 
equations  with  constraints  on  the  unknowns. 

Problem 


min  t  (1) 

,e«,z€C 

subject  to  the  approximation  constraints 

| zAj  -f}\  s  t,  jm  l, ....  m,  (2) 

general  bound  constraints 

\zBj-gj\SCj,  jm  1, (3) 
and  the  simple  bound  constraints 

I  “  hj  I  S  djt  j  »  1 . n.  (4) 

It  is  assumed  that  the  matrices  A  E  CnXm,  B  S  Cnx/,  and  the  row  vectors  /  E  Cm, 

This  work  wax  supported  by  the  Office  of  Naval  Research  Project  PR014-07-01  and  by  the  Independ¬ 
ent  Research  Program  of  the  Naval  Underwater  Systems  Center. 

Author's  address:  Naval  Underwater  Systems  Center,  New  London,  CT  06320. 

1986  ACM  0098-3600/85/0900-0242 

ACM  Transactions  on  Mathematical  Software,  Vol.  U,  No.  3,  September  1986,  Pages  242-249. 


-419- 


Algorithm  635 


243 


gEC'  ,h£Cn,d&  Rn,  and  c€.R'  are  ail  given.  It  is  also  assumed  that  c,  >  0  and 
d,  >  0  for  all  indices  j.  The  vector  of  unknowns,  2,  is  taken  to  be  a  row  vector  for 
reasons  of  notational  convenience.  Also,  the  jth  columns  of  matrices  A  and  B  are 
denoted  A,  and  B,,  respectively.  Note  that  m  is  allowed  to  be  either  greater  than, 
less  than,  or  equal  to  n.  The  simple  bounds  (4)  are  always  assumed  to  be  in  the 
problem  statement;  however,  the  more  general  bounds  (3)  are  allowed  to  be 
nonexistent.  A  different  set  of  subroutines  is  given  to  solve  this  problem  when 
the  solution  vector  2  is  required  to  be  real  valued. 

The  algorithm  is  a  very  efficient  implementation  of  the  simplex  method  of 
linear  programming  applied  to  a  discretized  version  of  this  problem. 

Discretized  Problem  min  t  (5) 

<€*./€  C* 


subject  to: 


-  fj 

w 

VI 

Q 

j  *  1. . 

. . ,  m, 

(6) 

zBj  -  gj 

VI 

Q 

1. 

(7) 

1  *j  ~  hi 

1  d  ^  dJt 

1. 

. . . ,  n. 

(8) 

where,  for  any  complex  number  u  €  C,  we  defined  the  “discretized  absolute  value” 
|  u\d  =  max|(Re  u)  cos  0*  +  (Im  u)  sin  0*1,  (9) 

IS  ksp 


where  D  -  |0lf . . . ,  0P}  with 

0*  -  (k  -  1)  2x/p,  k  *1,2 . p  (10) 

and  p  is  a  positive  integer  controlling  the  degree  of  discretization.  In  this 
implementation  of  the  algorithm,  we  have  required  that  p  *  2**LOGP,  where 
LOGP  is  greater  than  or  equal  to  one.  From  (1,  Eq.  (12))  we  have 


|u|0  S |u|  <  |  u I© 


(ID 


Thus,  to  attain  a  relative  accuracy  of  five  significant  decimal  digits  (i.e.,  a  relative 
error  less  than  0.5  x  10"s)  in  the  discretized  absolute  value  requires  that 
p  2  1024.  Other  properties  of  the  discretized  absolute  value  are  given  in  [1] 
and  (2).  Also  in  (2)  is  a  discussion  of  problem  (l)-(2),  without  the  constraints 
(3)-(4),  as  a  semi-infinite  program  (SIP). 

The  error  incurred  by  solving  the  Discretized  Problem  (5)-(8)  instead  of  the 
original  Problem  (1)— (4)  is  given  in  [1,  Theorem  2],  which  is  repeated  here  for 
the  sake  of  completeness. 

THEOREM  2.  Let  «•  6  R  and  z •  E  C"  solve  Problem  (l)-(4),  and  let  «**  E  R 
and  z**  €  C"  solve  the  Discretized  Problem  (5)-(8).  Then 

«•*  sec^j,  (12) 

ACM  Tranaactiofla  on  MathtmJtical  Software.  Vot.  11,  No.  3.  Sapttmbar  1985. 


-420- 


244 


Roy  L  Stmt 


and 


I -/;|  S  <**  sec^j, 
\z**Bj  -gj\  S  Cj  aec^^j  * 


;  -  1 . m, 

j  m  1,  •  •  • » ft 

J  3  !»•••*  n* 


It  is  clear  from  this  result  that  the  optimal  <  in  the  Discretized  Problem  converges 
to  the  optimal  <  in  the  original  Problem  quadratically  as  p  — *  oo;  however,  the 
optimal  z  vectors  need  converge  only  linearly  as  p  — *  ».  For  a  simple  example, 
see  [3]. 

The  Discretized  Problem  is  a  dense  linear  program  in  2n  +  1  real  variables 
and  (m  +  n  +  /)p  inequalities.  It  is  solved  numerically  by  solving  its  dual  using 
the  revised  simplex  method  with  explicitly  held  inverse.  Even  for  modest  values 
of  m,  n,  /,  and  p  the  dual  is  a  very  large  linear  program.  Fortunately,  it  also  has 
special  structure  which  can  be  used  very  effectively  to  greatly  reduce  total 
computational  effort.  Instead  of  requiring  the  (2n  +  l)(m  +  n  +  /)p  storage 
locations  that  would  be  necessary  in  a  straightforward  analysis,  this  imple¬ 
mentation  requires  only  2n(m  +  /)  +  2p  locations.  Moreover,  a  straight¬ 
forward  approach  would  require  0((m  +  /)np)  real  multiplications  to  determine 
the  most  negative  reduced  cost  (and  hence  the  entering  basic  variable)  in 
each  simplex  iteration.  This  implementation  requires  only  0((m  +  /)n)  + 
0((m  +  n  +  /)log2 p)  real  multiplications  for  the  same  purpose.  In  other  words 
the  discretization  parameter  p  does  not  significantly  affect  the  computational 
effort  of  a  single  simplex  iteration.  The  size  of  p  impacts  primarily  only  the 
total  number  of  iterations  necessary  to  reach  the  optimal  solution.  The  details  are 
given  in  [1]. 

The  revised  simplex  method  with  pivoting  to  update  the  basis  inverse  is  known 
to  be  numerically  unstable.  Should  a  stable  version  become  necessary,  one  can 
update  the  QR  factors  of  the  basis  instead.  The  cost  is  a  bit  more  computational 
effort  in  each  simplex  iteration.  In  practice,  however,  fewer  iterations  may  be 
necessary  with  QR  updating  because  of  its  stability.  Consequently,  total  CPU 
time  may  not  be  significantly  affected. 

As  was  just  described,  the  growth  of  computer  storage  as  a  function  of  p  is 
precisely  2p.  This  is  quite  satisfactory  for  all  but  the  most  demanding  of 
applications.  It  is  possible,  however,  to  make  the  algorithm’s  storage  requirements 
independent  of  p  with  slightly  more  computational  effort  per  simplex  iteration. 
Similarly,  as  a  function  of  p,  the  multiplication  count  per  simplex  iteration  grows 
as  log2p,  but  it  is  possible  to  alter  the  algorithm  so  that  this  growth  is  independent 
of  p.  Reprogramming  the  code  given  here  to  effect  this  modification  should  not 
be  too  difficult,  if  it  ever  becomes  desirable  to  do  so.  Theoretically,  then,  the 
Discretized  Problem  can  be  solved  by  an  algorithm  whose  storage  requirements 

ACM  Tnmeactiooe  oo  Mathematical  Software,  Vol  11.  No.  3,  September  19S6. 


-421- 


Algorithm  635 


245 


and  multiplication  count  per  iteration  is  independent  of  the  discretization  param¬ 
eter  p;  only  the  total  number  of  iterations  need  remain  dependent  on  p. 

There  are  four  subroutines  in  the  package. 

CAPROX  This  is  the  main  routine  that  implements  the  revised  simplex 
method  to  solve  the  dual  of  the  Discretized  Problem. 

CPA1RS  This  subroutine  prints  the  optimal  basis  names  (if  requested)  of 
the  Discretized  Problem  so  that  natural  pairings  (see  [l])  in  the 
optimal  basis  are  immediately  apparent. 

CEND  This  subroutine  stores  the  best  computed  solution  in  the  proper 
location  prior  to  exit  from  CAPROX. 

PABS  This  is  a  subroutine  that  solves  the  optimization  subproblem  (9) 

for  a  given  complex  number  u;  that  is,  it  computes  the  maximum 
in  (9)  and  also  the' minimal  clockwise  angle  0*  for  which  this 
maximum  occurs. 

These  four  routines  must  be  used  together,  but  only  CAPROX  need  be  called  by 
users  of  the  algorithm.  They  have  been  tested  on  the  VAX  11/780,  and  they  have 
all  been  verified  by  the  PFORT  verifier  [4]  for  portability. 

In  general,  p  cannot  be  taken  equal  to  2  without  losing  the  desirable  approxi¬ 
mation  properties  of  the  Discretized  Problem.  In  some  special  cases  letting 
p  —  2  will  work,  for  example,  when  the  problem  is  entirely  real  valued.  From 
(10),  for  p  =*  2  we  have  0j  =  0  and  02  =  r,  so  that  |  u  |  D  =•  |  u  |  when  u  is  real— as 
it  always  will  be  in  real  valued  problems.  For  this  reason  the  implementation 
allows  LOGP  »  1  as  a  legal  input.  Most  problems,  however,  will  require  that 
LOGP  >  2  for  successful  convergence  to  a  desirable  solution. 

For  those  applications  in  which  the  solution  vector  z  must  be  real  valued  even 
though  the  matrices  A  and  B  and  the  vectors  h,  g,  and  f  are  all  complex,  a 
different  but  highly  similar  set  of  FORTRAN  subroutines  has  been  provided. 
The  four  subroutines  in  this  package  are  KAPROX,  KPAIRS,  KEND,  and 
PABS.  The  routine  PABS  is  the  same  one  referred  to  above.  All  four  routines 
must  be  used  together,  all  have  been  tested  on  the  VAX  11/780,  and  all  have 
been  verified  by  the  PFORT  verifier.  These  routines  require  less  storage  and  are 
significantly  faster  than  the  more  general  problem  allowing  complex  solution 
vectors. 

2.  EXAMPLE 

The  following  numerical  example  (not  included  in  (!])  is  a  constrained  complex 
function  approximation  problem  on  a  disconnected  domain.  We  approximate  the 
constant  function  1  by  polynomials  of  degree  n,n2l,  which  have  zero  constant 
terms.  The  domain  is  the  union  of  a  circle  with  center  at  2 i  and  radius  1  and  a 
square  with  center  at  -2i  and  sides  of  length  2.  In  addition,  bounds  are  placed 
on  the  magnitudes  of  the  coefficients  of  the  approximating  polynomial  as  well  as 
on  the  magnitude  of  its  first  two  derivatives  evaluated  at  the  point  1. 

To  pose  this  problem  in  the  form  (l)-(4),  we  must  first  discretize  the  domain 
boundary.  Rather  arbitrarily,  we  take  125  data  points  equispaced  around  the 

ACM  Tranuctkxu  on  Mathematical  Software,  VoL  11,  No.  3,  September  1986. 


-422- 


246  •  Roy  L  Strait 


circle  and  160  data  points  equispaced  around  the  square.  This  gives  about  the 
same  spacing  (as  measured  by  arc  length)  on  both  the  circle  and  the  square. 
Explicitly,  for  precision’s  sake,  the  data  points  on  the  circle  are 


Uj  - 


(j  -  1)2tA 

125  )y 


1. 


and  the  data  points  on  the  square  are 


125, 


*i 


1 


u  - 1)' 
20 


i, 


U\a&+j 


(j  -  1) 
20 


*. 


UjOS+y  — 


+ 


U  - 1)' 
20 


3i, 


1  + 


-3 


+ 


U  - 1) 
20 


l‘. 


j  *  1, ....  40, 

;«  1 . 40, 

j  =  1,  ....  40, 
j  *  1, . . . ,  40. 


Note  that  both  the  continuous  domain  and  the  discrete  domain  are  symmetric 
about  the  imaginary  axis. 

The  components  of  the  solution  vector  z  of  (l)-(4)  represent  the  coefficients 
of  the  approximating  polynomial  in  this  problem.  Hence,  the  inequalities  (2)  are 
written  simply 


/y-1. 


-  1 . 285, 

j  »  1, . . . ,  285. 


The  general  bounds  (3)  express  the  derivative  constraints  by  defining 


■  1* 

ro 

2 

2 

B,- 

3 

.  B,- 

6 

_  n_ 

.  n(n  -  1) . 

gi  -  gt  *  0,  c,  -  c,  -  J. 

ACM  TiaaaactioiM  on  Mathematical  Software,  Vol  11,  No.  3,  September  1986. 


-423- 


Algorithm  635  •  247 


The  coefficient  bounds  are  expressed  by  the  inequalities  (4);  for  illustrative 
purposes,  we  take 


'1  =  0} 
dj  -  i  j 


j  “  1,  *  •  • »  rt. 


Finally,  we  set  p  -  1024  and  solve  the  Discretized  Problems  for  «**  and  z**. 
See  Table  I.  For  1  <  n  <  4,  the  problems  might  as  well  lack  constraints  of  type 
(3)  and  (4)  since  these  constraints  are  inactive  at  the  optimal  solution.  For 
5  <  n  <  8,  exactly  one  constraint  of  type  (3)  and  one  of  type  (4)  are  active  at  the 
solution.  Optimal  vectors  z**  for  n  =»  4  and  n  =  8  are  given  in  Table  II. 

The  discretized  complex  u-domain  is  symmetric  about  the  imaginary  axis. 
Hence,  from  (5,  pp.  26-27],  if  general  bounds  (3)  and  simple  bounds  (4)  are  made 
so  loose  that  they  are  never  active  at  optimal  solutions,  this  problem  must  have 
solution  vectors  z  with  alternately  pure  real  and  pure  imaginary  components.  As 
Table  II  clearly  shows,  this  effect  need  not  occur  when  constraints  of  type  (3) 
and  (4)  are  active  at  optimality. 

Under  the  additional  requirement  that  the  solution  vector  z  be  real  valued,  the 
same  problem  was  solved  using  subroutine  KAPROX.  The  results  are  summarized 
in  Tables  III  and  IV.  We  note  that  simple  bounds  (4)  are  not  active  for 
1  £  n  5  8  and  the  general  bounds  (3)  are  not  active  for  1  S  n  £  7.  In  this  problem, 
when  the  bounds  (3)  and  (4)  are  not  active  at  optimality,  every  real  solution 
vector  must  have  odd  numbered  components  which  are  zero.  (This  follows  easily 
from  symmetry  properties  in  the  underlying  u-domain.)  Clearly,  from  Table  IV, 
when  a  general  bound  (3)  is  active,  the  odd  numbered  components  need  not  be 
zero. 

The  coefficients  for  n  =  2  in  Table  IV  deserves  explanation.  From  Table  III  it 
is  apparent  that  the  error  in  the  best  (real)  approximation  is  1,  so  it  must  be  the 
case  that  both  coefficients  are  zero.  So  why  is  the  second  coefficient,  z2,  equal  to 
-0.001534?  This  is  an  effect  of  the  discretization  process  and  the  fact  that 
coefficients  need  to  converge  only  linearly  as  p  goes  to  infinity.  Closer  inspection 
of  the  problem  solution  shows  that  the  active  constraint  of  type  (1)  for;  =»  126 
is  a  point  where  the  upper  bound  (12)  is  attained.  Since  ut2 «  *  1  —  i,  fix  =  1, 
Zi  =  0,  «**  =  1,  and  p  -  1024,  we  have 

I  *Ai28  —  fix  I  *  «**  sec  —  (13) 

P 


or 


1 1  -  za(l  -  i)a| 


sec 


T 

1024’ 


Solving  for  z2  gives 

z2  **  — 7  tan  *  -0.00153398. 

2  1024 

It  would  appear  that  z2  satisfies  (13)  for  all  pr,  if  so,  it  will  never  equal  zero 
precisely  and  converge  to  zero  only  linearly. 

ACM  Tranaactiona  on  Matbamatkal  Software,  VoL  11,  No.  3,  Saptambar  1985. 


-424- 


248 


Roy  L.  Strait 


Table  I.  Optimal  Complex  Solutions  Using  Subroutine  CAPROX 


Order 

t  **  -  optimal « 

Iterations 

Time  in  seconds 
(VAX  11/780) 

1 

1.000000 

3 

1 

2 

0.973568 

41 

4 

3 

0.951666 

84 

8 

4 

0.905695 

169 

18 

5 

0.848662 

219 

22 

6 

0.848541 

312 

33 

7 

0.827420 

70S 

87 

8 

0.825552 

753 

108 

Table  II.  Optimal  Complex  Solution  Vectors  t  Using  Subroutine 
CAPROX 


a” 

component 

n  *  4 

n  ■*  8 

1 

.000000  +  .087000  i 

.040618  +  .055840  i 

2 

-.195483  +  .000000  i 

-.199848  -  .007808 1 

3 

.000000  +  .026738  i 

.020761  +  .073041  i 

4 

-.014866  +  .000000  i 

-.020794  -  .006209  i 

5 

.003313  +  .014802  i 

6 

.001217  -  .001490  i 

7 

.000184  +  .000917  i 

8 

.000186  -  .000105  i 

Table  III. 

Optimal  Real  Solutions  Using  Subroutine  KAPROX 

Time  in  seconds 

Order 

«•*  -  optimal  <  Iterations  (VAX  11/780) 

1 

1.000000 

2  1 

2 

1.000000 

12  2 

3 

1.000000 

5  1 

4 

0.977278 

49  6 

5 

0.977278 

44  5 

6 

0.948679 

74  9 

7 

0.948679 

71  9 

8 

0.877424 

157  18 

Table  IV. 

Optimal  Real  Solution  Vectors  i  Using  Subroutine 

KAPROX 

component 

(i  «  2  n  ■  4 

n  ■  6  n  «  8 

1 

0.000000  0.000000 

0.000000  0.073535 

2 

-0.001534  -0.105599 

-0.156179  -0.193224 

3 

0.000000 

0.000000  0.051396 

4 

-0.011453 

-0.025029  -0.051975 

5 

0.000000  0.008079 

6 

-0.001251  -0.006946 

7 

0.000462 

8 

-0.000372 

ACM  Transactions  on  Mathematical  Software,  VoJ.  11.  No.  3,  September  1965. 


-425- 


Algorithm  635 


249 


ACKNOWLEDGMENT 

The  author  would  like  to  thank  Marvin  J.  Goldstein  of  the  Naval  Underwater 
Systems  Center  for  making  available  software  [6]  that  significantly  eased  the 
burden  of  producing  readable  FORTRAN  code.  Without  the  use  of  this  software 
package,  bringing  the  initial  versions  of  subroutines  CAPROX  and  KAPROX 
into  compliance  with  certain  ACM  publication  standards  would  have  been 
extremely  tedious. 

REFERENCES 

1.  STRErr,  R.  L.  Solution  of  systems  of  complex  linear  equations  in  the  L  norm  with  contraints 
on  the  unknowns.  SIAM  J.  Sci.  Stat.  Comp.,  to  be  published  in  Jan.  1986.  (Also  in  Tech.  Rep. 
SOL  83-3.  Department  of  Operations  Research.  Stanford  University,  Stanford,  CA,  March,  1983.) 

2.  StreiT,  R.  L.  and  Nuttall,  A.  H.  A  Note  on  the  semi-infinite  programming  approach  to 
complex  approximation.  Math.  Comp.  40  (1983),  599-605. 

3.  Offer,  G.  Solving  complex  approximation  problems  by  semi-infinite  optimization  techniques. 
Numer.  Math.  39  (1982),  411-420. 

4.  Ryder,  B.  G.  The  PFORT  verifier.  Softw.  Pract.  Ezper.  4  (1974),  359-377 

5.  Meinardus,  G.  Approximation  of  Functions:  Theory  and  Numerical  Methods,  Springer- Verlag, 
1967. 

6.  Goldstein.  M.  J.,  and  Lawson,  J.  R.  Jr.  A  new  program  aid  in  producing  structured 
FORTRAN  programs.  NUSC  Tech.  Memo.  821162,  Naval  Underwater  Systems  Center,  New 
London,  CT,  9  Nov.  1982. 


Received  August  1983;  accepted  May  1985 


ACM  Transactions  on  Mathematical  Software.  Vo).  11.  No.  3,  September  1985. 


-426- 


Polynomial  Iteration  For  Nonsymmetric 
Indefinite  Linear  Systems 

H.  C.  Elman  and  R.  L.  Streit 


-427- 


N;  —I  >1  Iteration  for  Nonsymmetric  Indefinite  Linear  Sjitom 

Howard  C.  Himh 
Yale  University 

Department  of  Compater  Science 
New  Haven,  CT 

Roy  L.  Streit 

Naval  Underwater  Systems  Center 
New  London,  CT 


Abstract 

We  examine  iterative  methods  for  solving  sparse  nonsymmetric  indefinite  systems  of  linear 
equations-  Methods  considered  include  a  new  adaptive  method  based  on  polynomials  that 
satisfy  an  optimality  condition  in  the  Chebyshev  norm,  the  conjugate  gradients  like  method 
GMRES,  and  the  conjugate  gradient  method  applied  to  the  normal  equations.  Numerical 
experiments  on  several  non-self- adjoint  indefinite  elliptic  boundary  value  problems  suggest  that 
none  of  these  methods  is  dramatically  superior  to  the  others.  Their  performance  in  solving 
moderately  difficult  problems  is  satisfactory,  bat  far  harder  problems  their  convergence  is  slow. 

In  recent  years  there  has  been  significant  progress  is  the  development  of  iterative  methods 
far  solving  sparse  real  linear  systems  of  the  form 

An  =  b,  (1.1) 

where  A  is  a  nonsymmetric  matrix  of  order  N.  One  key  to  this  progress  has  beat  the  derivation 
of  polynomial  based  methods,  Le.  methods  whose  m-th  approximate  solution  iterate  has  the 
farm 

=  «0  +  fct-l(A)ro,  (12) 

where  «o  »  an  initial  guess  for  the  solution,  r0  —  b  A«o>  and  q„,,  i  is  a  real  polynomial  of 
degree  m  —  1.  The  residual  rm=l  -  Ann,  satisfies 


—  —  Aqw_i(A)]r0  =  Pm(A)r0, 


(1.3) 


where  Pm  is  a  real  polynomial  of  degree  m  such  that  f*n(0)  =  1.  Applying  any  norm  to  (1.3) 
gives 

lrmi  <  |Pm(A)||r0|. 

Moreover,  if  A  is  diagonalizable  as  A  =  UAU1,  then 

|Pm(A)|  =  |Upn,(A)U-lI  <  |(/l|I/'l  ^  lPm(*)|, 


The  rat  p— lii  is  this  paper  was  supported  by  the  U.  S.  06*  of  Naval  Reaearcb  aader  contract  N00014- 
tt-K-0614,  by  the  U.  S.  Army  Research  OBce  ode  contract  DAAG-SS-0177  and  by  the  Naval  Underwater 
Systems  Crater  Independent  Research  Project  A 70109. 


-429- 


104 


so  that 

IMI  <  max  |pm(X)|  Ikoli-  (1.4) 

a  eo(A) 

Thus  any  polynomial  pm  that  is  sufficiently  small  on  the  eigenvalues  of  A  is  a  good  candidate 
for  generating  an  iterative  method. 

The  conjugate  gradient  and  Chebyshev  methods  are  well-known  polynomial-based  meth¬ 
ods  for  solving  symmetric  positive-definite  systems  for  which  the  residual  polynomials  {pm} 
have  desirable  optimality  properties  [8].  Generalizations  of  these  techniques  have  been  devel¬ 
oped  for  solving  both  symmetric  indefinite  systems  (see  e.g.  [3,  4,  17,  18]),  and  nonsymmetric 
systems  with  definite  symmetric  part  (A  +  A7')/2  (see  e.g.  [5,  8,  14]  and  references  therein). 
In  the  latter  case,  all  of  the  eigenvalues  of  A  lie  in  either  the  right  half  or  the  left  half  of  the 
complex  plane.  Sparse  linear  systems  that  both  are  nonsymmetric  and  have  indefinite  sym¬ 
metric  part  arise  in  numerous  settings.  Examples  include  the  discretization  of  the  Helmholtz 
equations  for  modelling  acoustic  phenomena  ]l]  and  the  discretization  of  the  coupled  par¬ 
tial  differential  equations  arising  in  numerical  semiconductor  device  simulation  [12].  Gradient 
methods  that  have  been  proposed  as  solvers  for  such  problems  include  the  conjugate  gradient 
method  applied  to  the  normal  equations  (CGN)  (9),  the  biconjugate  gradient  method  [7],  the 
restarted  generalized  minimum  residual  method  (GMRES)  [20],  and  new  methods  presented  in 
[11,  26],  Smolarski  and  Saylor  [22]  and  Saad  [19]  have  proposed  adaptive  polynomial  iteration 
methods  of  the  form  (1.2)  using  polynomials  that  are  optimal  with  respect  a  weighted  least 
squares  norm.  In  this  paper,  we  introduce  a  polynomial-based  method,  PSUP,  that  computes 
a  polynomial  that  is  nearly  optimal  with  respect  to  the  Chebyshev  norm  on  a  region  containing 
the  eigenvalue  estimates  and  then  uses  this  polynomial  in  (1.2).  We  compare  its  performance 
with  the  two  gradient  methods  CGN  and  GMRES. 

In  Section  2,  we  give  a  brief  description  of  the  gradient  methods  CGN  and  GMRES.  In 
Section  3,  we  describe  the  new  PSUP  method  and  several  heuristics  developed  to  improve  its 
performance.  In  Section  4,  we  describe  numerical  experiments  in  which  these  three  methods 
are  used  to  solve  some  non-self-adjoint  indefinite  elliptic  problems,  and  in  Section  5  we  draw 
conclusions  based  on  the  numerical  tests. 

2.  Gradient  Methods 

In  this  section  we  briefly  review  two  conjugate  gradient-like  methods  for  solving  nonsym¬ 
metric  indefinite  systems.  The  conjugate  gradient  method  [9]  is  applicable  only  to  symmetric 
positive  definite  linear  systems.  For  nonsymmetric  systems,  it  can  be  used  to  solve  the  normal 
equations  aF  Ax  —  ATb.  The  scaled  residuals  {ATrm}  satisfy 

Arrm  =  pm(ATA)ATr0, 

where  pm  is  the  unique  polynomial  of  degree  m  such  that  pm(0)  =  1  and  ||rm||2  is  minimum.  As 
is  well  known,  the  condition  number  of  A? A  is  the  square  of  that  of  A.  Moreover,  the  standard 
implementation  of  CGN  requires  two  matrix-vectOT  products  at  each  iteration,  one  by  A  and 
one  by  AT,  plus  5N  additional  operations.  The  storage  requirement  is  4N  words.  The  depen¬ 
dence  of  CGN  on  AT A  has  led  to  efforts  to  find  alternatives  that  are  more  rapidly  convergent 
and  less  expensive  per  step.  For  nonsymmetric  systems  with  positive  definite  symmetric  part, 
several  methods  have  been  shown  to  be  superior  to  CGN  [5], 

GMRES  is  a  method  proposed  for  solving  nonsymmetric  indefinite  systems  that  avoids 
the  use  of  the  normal  equations  [20].  Given  an  initial  guess,  uo,  for  the  solution,  with  residual 
ro,  this  method  generates  an  orthogonal  basis  {tq, . . .  ,vm}  for  the  Krylov  space 

Km  =  span{r0,  Ar0, . . . ,  Am-1r0) 


-430- 


105 


using  Arnoldi’s  method.  Let  tq  =  ro/||ro||2-  The  Arnoldi  process  computes  for  /  =  1, . . . , m 

h{j  =  {Avj,  «,•),  *  =  1 

3 

Vj+l  =  Avj  ~  53  h'iv" 
i=l 

hj+lj  =  I|vy+ill2> 

vj+ 1  = 

GMRES  then  computes  an  approximate  solution 

m 

=  «0  +  53  a3vj'  (2.1.) 

3=1 

where  the  scalars  {ayjylj  are  chosen  so  that  ||rm||j  is  minimum.  These  scalars  can  be  computed 
by  solving  the  upper  Hessenberg  least  squares  problem 

n“n|jlkoll2ei-^mo||2. 

where  ej  =  (1,0, . . .  ,0)^  €  Rm+1  and  Hm  is  the  Hessenberg  matrix  of  size  (m  +  1)  x  m  whose 
(*,j)-entry  is  h,y  [20].  By  the  choice  of  basis  and  the  minimization  property,  rm  =  pm(A)r0 
where  pm  is  the  real  polynomial  of  degree  m  such  that  pm(0)  =  1  and  pm  is  optimal  with 
respect  to  the  residual  norm  ||rm||2  (c.f.  [8]  for  other  formulations  of  this  optimal  iteration). 

In  a  practical  implementation,  the  dimension  m  of  the  Krylov  space  is  fixed,  and  the 
GMRES  iteration  is  restarted  with  um  in  place  of  uo.  This  is  the  GMRES(m)  method.  Defining 
one  “step”  to  be  the  average  of  the  m-fold  iteration  divided  by  m,  the  cost  per  step  is  (m  + 
1 3  +  1  /m)N  operations  plus  one  matrix- vector  product.  It  requires  (m  +  2  )N  words  of  storage. 

We  remark  that  the  Arnoldi  process  was  originally  developed  as  a  technique  for  computing 
eigenvalues  [27].  Let  Vm  denote  the  matrix  whose  columns  are  the  m  vectors  generated  by  the 
Arnoldi  step  in  GMRES(m),  and  let  Hm  denote  the  square  upper  Hessenberg  matrix  consisting 
of  the  first  m  rows  of  Hm.  Then  Vm  is  an  orthonormal  matrix  of  order  N  x  m  that  satisfies 

v£AVm  m  Hm.  (2.2) 

Relation  (2.2)  resembles  a  similarity  transformation,  and  Arnoldi’s  method  consists  of  using 
the  eigenvalues  of  Hm  as  estimates  for  (some  of)  the  eigenvalues  of  A.  Suppose  A  =  UkU~x 
for  diagonal  A  and  r0  is  dominated  by  m  eigenvectors  {«;}y ll,  with  corresponding  eigenvalues 
{Ay}yLj.  Then  the  residual  after  m  GMRES  steps  satisfies  [6] 

Ikrnlls  <  l|t%llOl2  Cm  IMl2 


where 

m 

cm  =  max  TT  |Afc  -  Ay|/|Ay| 
k>m  . 

3= 1 

and  e  is  orthogonal  to  {t»y}JL,.  Loosely  speaking,  GMRES(m)  damps  out  from  the  residual 
the  eigenvectors  whose  eigenvalues  are  computed  by  Amoldi’s  method. 

3.  The  PSUP  Method 

The  gradient  methods  just  described  compute  iterates  and  residuals  that  satisfy  (1.2) 
and  (1.3)  (for  CGN,  with  respect  to  ATA)  in  which  the  polynomials  are  built  up  recursively 


-431- 


1 06 


without  explicit  computation  of  their  coefficients.  In  this  section,  we  describe  an  alternative 
iteration  that  computes  explicitly  the  coefficients  of  a  polynomial  9m_1(z)  for  which  pm(z)  = 
1  —  i(r)  is  small  on  the  spectrum  er{A).  In  the  following,  we  will  refer  to  the  polynomial 
t(r)  of  (1.2)  as  the  “iteration  polynomial'  and  to  the  polynomial  pmfz)  =  1  —  zft»  i(z) 
of  (1.3)  as  the  “residual  polynomial.’ 

Suppose  a  compact  region  D  C  C  contains  <r\A).  Let  pm  be  a  polynomial  of  degree  m 
that  satisfies 

Pm{0)  =  1,  | Pm |  =  tOUC  [pm(z)|  =  f  <  1. 

As  is  evident  from  (1.4),  an  iteration  haring  pm  as  its  residual  polynomial  will  result  in  a 
decrease  of  the  residual  norm  if  <  is  small  enough.  The  best  possible  iteration  polynomial  with 
respect  to  this  norm  (the  Chebysher  norm)  is  the  solution  to  the  minima*  problem 

t  =  min  max 1 1  -  z«»-i(z)|.  (3.1) 

V»-i  teD 


Let  <?»-i(r)  =  «/*>• 

system  of  equations 


The  solution  to  (3.1)  is  also  the  Chebysher  solution  to  the  infinite 
£  z^uy  =  1,  z  e  dD  (3.2) 

J=0 


Only  the  boundary  dD  need  be  considered  because  of  the  maThnnm  modulus  principle. 

The  PS  UP  method  uses  an  iteration  polynomial  obtained  from  an  approximate  solution 
to  (3.1).  We  briefly  summarize  the  technique  used;  details  can  be  found  in  [24).  First,  (3.2)  is 
replaced  by  a  finite  dimensional  problem 


7=0 


z  e  dDu. 


(3.3) 


where  dDy  is  a  finite  subset  of  dD  containing  Af  points,  M  >  m.  Equation  (3.3)  is  an 
orerdetermined  system  of  Af  equations  in  the  m  unknowns  {oy  ■  The  Chebysher  problem 
for  (3.3)  is  given  by 

m— 1 

min  max  I  E  z*+,aj  —  l[.  (3.4) 

Second,  equation  (3.4)  is  solved  approximately  using  a  semi-infinite  linear  programming  ap¬ 
proach  to  complex  approximation,  which  is  based  cm  the  identity  |w|  =  maxo</<2r  Re(wt~,t), 
w  €  C.  Let  0  =  {#i, ... , 9p}  C  (0,2r),  and  define  the  discretized  absolute  value 


t»le  =  Re{u>e  ) 

f€v 


Consider  the  discretized  problem 


mm  iw>t 

{«/} 


7=0 


(3.5) 


where  the  absolute  value  in  (3.4)  is  replaced  by  the  discretixed  absolute  value.  This  gives  rise 
to  a  linear  program  for  {uy}™^1.  Let  f*  denote  the  minimax  value  of  |  Ey^1  -  1|  at 

the  solution  to  (3.4),  and  let  ej  denote  the  mtniimv  value  for  (3.5).  It  can  be  shewn  that 

Me  <  M  <  Me  aee(a/2) 


-432- 


107 


for  all  w  G  C,  and  consequently  that 


V  <  <*  <  aee(«r/2), 

where  a  is  the  smallest  difference  (mod  2r)  between  two  neighboring  angles  in  6.  The  upper 
bounds  are  sharpest  for  given  p  when  6  consists  of  the  p-th  roots  of  unity,  so  that  a  =  2*fp. 
We  use  this  choice  of  0  in  the  following,  with  p  =  256  so  that  see(a/2)  =  1.000075. 

The  dual  of  the  LP  (3.5)  can  be  written  in  the  form 


min  /?e|ew5e_,®j 
Sewt¥x*,  <?e«  M  1 


subject  to:  S  >  0,  Q  >  0,  ZTSe~iB  =  0  €  C"* 

U  P 

“d  «  +  ££5i*  =  1» 
j=l  fc=i 

where  ei/  €  C**  is  the  vector  whose  components  are  all  1 ,26  c*fxm  is  the  coefficient  matrix 
of  (3>4) ,  and  e~*°  €  C*  denotes  the  vector  whose  y'-th  component  is  e- *V  Q  is  a  slack 
variable  which  must  be  0  if  r*  >  0.  A  straightforward  application  of  the  simplex  method  to  the 
dual  requires  O(Vmp)  multiplications  per  simplex  iteration  and  0[Mmp)  storage  locations. 
In  [24],  it  is  shown  that  the  factor  p  can  be  eliminated  from  these  estimates  fay  exploiting  the 
special  structure  of  the  dual.  These  economies  leave  unaltered  the  sequence  of  basic  feasible 
solutions  that  the  simplex  method  generates  en  route  to  the  solution.  Moreover,  they  simplify 
further  if  the  coefficients  {ay}  are  required  to  be  real,  hi  practice  the  number  of  simplex 
iterations  has  been  observed  to  be  O(m)  so  that  the  computational  effort  to  compute  {ay} 
using  the  algorithm  in  [23j  is  0(Mm7).  In  the  experiments  discussed  below,  both  Af  and  m 
are  significantly  smaller  than  the  order  N  at  the  linear  system  so  that  construction  of  the 
coefficients  of  the  iteration  polynomial  is  a  low  order  cost  of  the  solution  process. 

Given  «o  and  rg,  the  basic  PS  UP  iteration  consists  of  repeated  application  of  the  iteration 
polynomial  9m^1,  as  follows: 

Algorithm  1 :  The  PSUP  iteration. 

For  k  =:  1,2, . . .  Do 

*km  ~  *(*—  l)m  "f  4n»-l('^)r(t—  i)m 

rkm  =  l>- 

The  actual  computation  w  «—  gm_i(A)r  B  performed  using  Homer’s  rule: 

“  *-  an*-lr 

For  j  =  1  to  m  —  1  Do 

• «—  Aw 

w  — «w_i_yr  +  u. 

The  m-fold  PSUP  iteration  requires  m  matrix-vector  products  and  m  scalar  vector  prod¬ 
ucts,  so  that  the  "average”  cost  is  one  matrix-vector  product  and  one  scalar- vector  product. 
PSUP  requires  AN  storage,  for  »,  r,  v  and  w. 

In  practice,  the  PSUP  iteration  needs  estimates  of  the  eigenvalues  of  A  in  order  to  obtain 
the  set  D.  Several  adaptive  techniques  have  been  developed  for  combining  an  eigenvalue 
estimation  procedure  with  polynomial  iteration  [6,  13,  19].  We  will  use  the  hybrid  technique 
developed  in  [6,  19],  which  uses  Arnoldi’s  method  for  eigenvalue  estimates. 

First,  the  Arooldi  process  is  used  to  compote  some  number  i,  of  eigenvalue  estimates 
prior  to  execution  of  the  PSUP  iteration.  Given  these  estimates,  a  set  D  is  constructed  that 
contains  them,  from  which  the  PSUP  iteration  polynomial  fo-i  is  computed.  (We  discuss 
our  choice  for  D  below.)  One  possible  strategy  is  to  perform  the  PSUP  iteration  with 


-433- 


108 


until  the  iteration  converges.  However,  there  is  no  guarantee  that  all  the  extreme  eigenvalues 
of  A  are  computed  by  the  Arnoldi  procedure.  The  set  D  is  contained  in  the  lemniscate  region 
[lOj  Lm  =  (z  €  C  |  |pm(z)|  <  <},  where  e  and  pm  =  1  -  zqm_i(z)  solve  (3.1).  Moreover,  the 
modulus  of  pm  is  greater  than  t  outside  Lm  and  tends  to  grow  rapidly  outside  Lm,  at  least 
in  some  directions.  If  an  eigenvalue  A  lies  outside  Lm  and  |pm(A)|  is  large  enough,  then  the 
PSUP  method  will  diverge. 

One  way  to  avoid  this  behavior  is  to  invoke  the  adaptive  procedure:  if  PSUP  diverges 
then  ka  additional  Arnoldi  steps  are  performed  to  compute  ka  new  eigenvalue  estimates.  These 
estimates  are  then  used  to  construct  a  new  enclosing  set  D  and  a  new  iteration  polynomial 
qm-l,  with  which  the  PSUP  iteration  is  resumed.  A  good  choice  for  a  starting  vector  vj  is  the 
last  residual  from  the  previous  PSUP  iteration  (normalized  to  have  unit  norm).  For  if  PSUP 
diverges,  then  the  residual  will  tend  to  be  dominated  by  the  eigenvectors  whose  eigenvalues  are 
not  being  damped  out  by  the  PSUP  polynomial.  Moreover,  this  technique  can  be  improved 
using  GMRES.  Once  the  ka  Arnoldi  vectors  are  available,  the  GMRES(fc0)  iteration  (2.1)  can 
be  performed  at  relatively  little  extra  expense.  This  has  the  effect  of  damping  out  Rom  the 
residual  the  eigenvector  components  that  were  being  enhanced  by  the  previous  PSUP  iteration. 

Rather  than  use  the  PSUP  iteration  alone,  we  consider  a  hybrid  PSUP-GMRES  method 
that  makes  use  of  these  observations.  This  method  consists  of  repeated  iteration  of  some 
number  s  of  PSUP  steps,  followed  by  a  smaller  number  ka  of  Arnoldi-GMRES  steps.  The 
initial  eigenvalue  estimates  are  provided  by  A:,-  Arnoldi-GMRES  steps,  where  A,-  may  differ 
from  ka.  In  addition,  the  adaptive  procedure  is  invoked  immediately  if  the  residual  norm  of 
the  PSUP  iteration  increases  by  some  tolerance  r  relative  to  the  smallest  residual  previously 
encountered.  The  following  is  a  modification  of  the  hybrid  method  developed  in  [6]  that  uses 
the  PSUP  iteration: 

Algorithm  £  The  hybrid  GMRES-PSUP  method. 

Choose  uq.  Compute  ro  =  6  —  Auo- 

Until  Convergence  Do 

Adaptive  (Initialization)  Steps:  Set  vj  =  the  current  normalized  residual, 
perform  ka  (or  A,)  Amoldi/GMRES  steps,  and  use  the  new  eigenvalue 
estimates  to  update  (or  initialize)  the  PSUP  coefficients. 

PSUP  Steps:  While  OM/II-WJ  <  r)) 

Perform  s  steps  of  the  PSUP  iteration  (Algorithm  1)  to 
update  the  approximate  solution  tty  and  residual  ry. 

For  the  enclosing  set  D  we  take  the  union  of  the  four  sets  Dj ,  where  Dj  is  the  convex  hull 
of  the  set  of  eigenvalue  estimates  in  the  j'-th  quadrant  of  the  complex  plane.  With  this  choice, 
if  the  extreme  eigenvalues  of  each  quadrant  have  been  computed,  then  all  the  eigenvalues  are 
contained  in  D.  If  all  the  eigenvalue  estimates  in  either  half  plane  are  real,  then  the  part  of  D 
containing  these  estimates  is  taken  to  be  the  line  segment  between  the  leftmost  and  rightmost 
estimates  in  the  half  plane. 

There  is  no  guarantee  that  the  eigenvalue  estimates  computed  by  Amoldi’s  method  are 
accurate.  Moreover,  since  the  PSUP  residual  polynomial  has  the  value  1  at  the  origin,  if  D 
contains  points  with  both  positive  and  negative  real  parts  that  are  near  the  origin,  then  the 
Chebyshev  norm  of  the  residual  polynomial  will  be  very  close  to  1.  (See  Section  4  for  an 
example.)  We  consider  one  heuristic  designed  to  improve  the  performance  of  the  hybrid  PSUP 
method  on  problems  with  eigenvalues  very  near  the  origin:  we  successively  remove  the  points 
closest  to  the  origin  from  the  set  of  eigenvalue  estimates  (and  generate  a  smaller  D)  until 
the  norm  of  the  PSUP  polynomial  is  smaller  than  some  predetermined  value  rj,  and  use  that 
polynomial  for  the  PSUP  iteration. 

There  are  two  possible  effects  of  this  heuristic.  If  the  deleted  points  are  not  accurate  as 
eigenvalue  estimates,  then  the  resulting  PSUP  iteration  will  be  just  as  robust  and  more  rapidly 
convergent  than  if  the  deleted  points  had  been  included.  On  the  other  hand,  if  the  deleted 


-434- 


109 


points  are  good  estimates,  then  the  PSUP  polynomial  will  probably  be  large  on  the  deleted 
points,  and  the  iteration  will  not  damp  out  the  residual  in  the  direction  of  the  corresponding 
eigenvectors.  However,  if  the  dimension  of  this  eigenspace  is  small  (say,  2  or  3),  then  the 
iteration  should  damp  out  the  residual  in  all  other  components,  so  that  the  residual  should  be 
dominated  by  a  small  number  of  components.  In  this  situation,  a  small  number  of  GMRES 
steps  should  damp  out  these  dominant  components.  We  will  refer  to  the  hybrid  PSUP  method 
with  this  heuristic  added  as  the  GMRES/Reduced-PSUP  scheme. 

We  note  that  with  the  methods  of  [24],  (3.5)  can  be  also  solved  with  the  constraint 

m— 1 

max  I  Y'  zJ+la,  —  l|  <1, 
teE  If-;  3  le 
j=o 

where  E  is  some  finite  set.  In  particular,  if  E  is  the  set  of  deleted  eigenvalue  estimates  in 
the  GMRES/Reduced-PSUP  scheme,  then  the  PSUP  polynomial  on  the  reduced  set  D  can 
be  forced  to  be  bounded  in  modulus  by  one  on  the  deleted  points.  In  experiments  with  this 
version  of  the  GMRES/Reduced-PSUP  iteration,  we  found  its  performance  to  be  essentially 
the  same  as  that  of  the  unconstrained  version  described  above. 


4.  Numerical  Experiments 

In  this  section,  we  compare  the  performance  of  CGN,  GMRES(m),  GMRES/PSUP  and 
GMRES/Reduced-PSUP  in  solving  several  linear  systems  arising  from  a  finite  difference  dis¬ 
cretization  of  the  differential  equation 


-A u  4-  2 P\ux  +  2P2«y  -  P3U  =  /,  ti£fi,  (4.1) 

u  =  g,  u  6  dCl, 

where  fl  is  the  unit  square  (0  <  x,y  <  1},  and  P\,  P2  and  P3  are  positive  parameters.  We  use 
f  —  g  =  0,  so  that  the  solution  to  (4.1)  is  u  =  0. 

We  discretize  (4.1)  by  finite  differences  on  a  uniform  n  x  n  grid,  using  centered  differences 
for  the  Laplacian  and  the  first  derivatives.  Let  h  —  l/(n  +  1).  After  scaling  by  h2,  the  matrix 
equation  has  the  form  (1.1)  in  which  the  typical  equation  for  the  unknown  1 t,y  rs  u(ih,jh)  is 

(4  -  -  (1  +  0)U, +  (-1  +  /?)«,+!,/-  (1  +  'rKj-1  +  (-1  +  -r)«i, y+1  =  h2fi}-, 

where  0  —  Pjh,  =  P^h,  a  =  P^h2  and  /,y  =  f(th,jh).  The  eigenvalues  of  A  are  given  by  [21] 


4  -  a  +  2 \j  1  -  02  cos — —~7  +  2\/l  —  q r2  cos  1  <  s,t  <  n. 

v  n  + 1  v  n  + 1 


The  eigenvalues  of  the  symmetric  part  are 


„  an  „  f?r 
4  —  a  +  2co« - -  +  2cos- 


n  +  1 


n  +  1 


1  <  «,t  <  n. 


The  leftmost  eigenvalue  of  the  symmetric  part,  corresponding  to  s  =  t  =  n,  is  given  by 

(2*2  -  Pi)h 2  +  0(h4), 

so  that  for  small  enough  h  the  symmetric  part  is  indefinite  when  Pj  >  2w2. 

Six  test  problems  corresponding  to  six  choices  of  the  parameter  set  (Pj,  P2,  Ps}  are 
considered.  We  use  the  three  values  P3  =  30,  80,  and  250  together  with  each  of  the  pairs  of 


-435- 


110 


nho  {J*i  =  1.  Pi  —  2}  and  {Pj  =  25 ,  =  50}.  Ibr  all  tests,  n  =  31,  so  that  the  order 

N  =  n*  is  969.  For  all  six  test  problems,  the  coefficient  matrix  A  is  indefinite,  and  the  number 
of  negative  eigenvalues  of  (A  +  AT)/2  is  increasing  as  Pj  grows.  For  the  first  choice  of  the 
(Pi,  Pi)  pair,  A  is  mildly  nonsymmetrie  and  its  eigenvalues  are  real,  and  for  the  second  choice, 
A  is  more  highly  nonsymmetrie  and  has  complex  eigenvalues. 

Although  it  is  not  oar  intention  here  to  examine  preconditioners  for  indefinite  systems, 
preconditioning  has  been  shown  to  be  a  critical  factor  in  the  performance  of  iterative  methods 
[3,  5,  15].  In  our  tests,  we  precondition  (1.1)  by  the  finite  difference  discretization  of  the 
Laplacian.  That  is,  the  iterative  methods  being  considered  are  applied  to  the  preconditioned, 
problem 

AQ~lx  =  b,  x  =  Q~lx, 

where  Q  is  the  discrete  Laplacian  (See  [2]  for  an  asymptotic  analysis  of  this  preconditioner 
for  finite  element  discretizations.)  The  preconditioned  matrix-vector  product  then  consists  of 
a  preconditioning  solve  of  the  form  Q-1v  and  a  matrix  multiply  of  the  form  Av.  Since  fl  is 
a  square  domain,  the  preconditioning  is  implemented  using  the  block  cyclic  reduction  method 
at  a  cost  of  3n2k>g2n  operations  {25].  We  have  confirmed  numerically  that  the  preconditioned 
matrix  AQ~1  in  all  six  problems  has  indefinite  symmetric  part. 

We  use  the  following  parameters  for  the  hybrid  GMRES- PSUP  iteration.  In  an  effort 
to  obtain  the  dominant  and  subdommant  eigenvalues  of  each  quadrant  at  the  outset,  the 
initialization  step  consists  of  eight  GMRES  steps  (k,-  =  8)  giving  eight  eigenvalue  estimates. 
All  subsequent  calls  to  the  adaptive  procedure  consist  of  four  GMRES  steps  (fc«  =  4).  For 
all  tests  with  PSUP,  we  use  a  residual  polynomial  of  degree  four  (m  =  4),  and  allow  at  most 
a  =  32  PSUP  steps  (or  eight  successive  applications  of  the  PSUP  polynomial).  The  adaptive 
procedure  is  invoked  if  the  residual  norm  increases  during  a  PSUP  step  (r  =  1),  or  alter  a 
steps  are  performed.  We  use  Af  =  100  points  for  the  discretized  enclosing  set  dDu,  and 
allocate  them  so  that  the  number  of  points  in  each  quadrant  is  approximately  proportional 
to  the  circumference  of  the  convex  hull  in  that  quadrant.  For  subsets  of  D  that  overlap  on 
quadrant  boundaries  (e.g.  if  a  line  segment  on  the  real  line  is  shared  by  regions  in  the  first  and 
fourth  quadrants),  the  shared  boundary  is  discretized  twice.  For  the  GMRES /Red  need- PS  UP 
scheme,  in  which  eigenvalue  estimates  closest  to  the  origin  are  deleted  until  the  min™**  norm 
is  less  than  some  tolerance  q,  we  examine  q  =  .5  and  .3.  For  this  scheme,  we  take  to  be 
two  phis  the  number  of  eigenvalue  estimates  deleted.  We  use  the  notation  GMRES- PSUP(m) 
(with  m  =  4)  for  the  “unreduced'  scheme,  and  GMRES-PSUP(m,  q)  for  the  reduced  version. 

We  examine  GMRES(m)  for  m  =  5  and  m  =  20.  Recall  that  the  latter  version  generates 
a  higher  degree  optimal  polynomial  at  the  expense  of  a  larger  average  cost  per  step. 

All  numerical  tests  were  ran  on  a  VAX  11-780  in  double  precision  (55  bit  mantissa).  The 
initial  guess  in  all  runs  was  a  vector  *o  of  random  numbers  between  -1  and  1.  Figures  1  -  6 
show  the  performance  of  the  methods  measured  in  terms  of  multiplication  counts,  for  the  six 
problems  (also  numbered  1  -  6).  Note  that  the  horizontal  scale  of  Figure  1  is  wider  than  the 
others,  and  the  scales  in  Figures  5  and  6  are  slightly  narrower.  Table  1  shows  the  iteration 
counts  needed  to  satisfy  the  stopping  criterion  of 


gr.Bi 

-  I—  <  in- 

BroS*  “ 


A  maximum  of  100, 150,  and  200  iterations  were  permitted  few  the  CGN,  GMRES  and  PSUP 
methods,  respectively.  (For  these  iteration  counts,  CGN,  GMRES(20)  and  GMRES- PS  UP  (4) 
performed  roughly  the  same  number  of  operations.)  Our  main  observations  on  this  data  are: 

1.  Problems  1  and  3  are  solved  efficiently  by  nearly  all  the  methods,  but  for  the  other  four 
problems  convergence  is  slow. 


-436- 


Ill 


1  h  general,  the  hybrid  GMRJES-PSUP(m)  scheme  is  wibrt.  He  plateaus  is  Figures  3, 
3  and  6  for  this  method  correspond  to  the  PS  UP  step,  for  which  convergence  is  very  don. 
The  "redaction"  heuristic  improves  the  performance,  bat  the  improvement  is  dae  largely 
to  increased  effectiveness  of  the  GMRES  part  of  the  iteration  (e.g.  in  the  steep  drops  of 
Figures  2  -  4),  and  the  improved  performance  is  not  better  than  that  of  GMRES  alone. 

3.  On  the  whole,  GMRES(20)  and  CGN  are  the  most  effective  methods  for  these  problems, 
bat  they  are  not  dramatically  superior  to  the  others.  GMRES(20)  converges  more  rapidly 
than  GMRES(5). 

Excluding  storage  for  the  matrix  and  right  hand  side,  the  storage  requirements  for  the  methods 
considered  are 


CGN: 

4 N 

GMRES(5): 

7 N 

GMRES(20): 

22 N 

A0  PSUP  variants: 

10N 

The  high  cost  of  the  PS  UP  methods  is  due  to  the  eight  initializing  GMRES  steps. 

Although  the  GMRES/Reduced-PSUP  (PSUP(m,  q))  scheme  is  not  as  fast  as  pore  GM¬ 
RES,  the  redaction  heuristic  does  have  its  intended  effect  of  improving  upon  the  hybrid  scheme. 
We  briefly  examine  the  effect  of  the  heuristic  on  Problem  3,  focusing  on  two  curve  segments 
of  Figure  3:  the  plateau  of  curve  D  (GMRES- PS  UP  (4))  between  multiplication  counts  200000 
and  300000,  and  the  last  plateau  in  curve  E  (GMRES- PS UP(4,.5)).  For  curve  D,  on  return 
from  the  adaptive  step  at  about  multiplication  count  200000,  the  real  parts  of  the  eigenvalue 
estimates  Be  in  the  intervals  [-3,-. 33]  and  [0.4,. 38],  the  Chebyshev  norm  of  the  residual  polyno¬ 
mial  is  .96,  and  convergence  is  slow.  For  curve  E,  on  return  from  the  adaptive  step  prior  to  the 
last  plateau  of  the  curve,  the  real  parts  of  the  eigenvalue  estimates  lie  in  the  intervals  [-3,-.56j 
and  I-05..971,  and  the  Chebyshev  norm  is  .96.  The  effect  of  deletion  of  points  is  shown  in  Table 
2.  The  Chefayahev  norm  is  very  large  when  there  are  points  near  the  origin,  and  it  declines  as 
these  points  are  deleted.  The  deletion  of  points  does  not  significantly  hurt  the  PSUP  part  of 
the  iteration  and  it  strongly  enhances  the  effect  of  the  GMRES  steps. 


Problem  # 

D 

2 

3 

4 

5 

6 

CGN 

13 

>100 

28 

>100 

>100 

>100 

GMRES(5) 

13 

>150 

46 

>150 

>150 

>150 

GMRES(20) 

10 

111 

17 

119 

>150 

>150 

GMRES- PS  UP 

16 

>200 

199 

>200 

>200 

>200 

PSUP(4,.5) 

16 

>200 

62 

>200 

>200 

>200 

PSUP(4,.3) 

16 

>200 

>200 

>200 

>200 

Table  1:  Iteration  counts. 


Deleted 

Points 

Intervals  Containing 
Real  Parts 

Chebyshev 

Norm 

- 

[-3,  -.56),  [.05, .97} 

.96 

.05 

[-3,  -.56],  [.34,-97] 

.76 

.34 

[-3,  -.56],  [  61..97] 

.55 

-.56 

[-3,-1.461,  (.61,-97] 

.33 

Table  3:  Effect  of  point  deletion  on  GMRES/Reduced- 
PSUP(4,.5)  for  Problem  3. 


-437- 


112 


113 


-439- 


114 


FSgnc  5;  P,  =  1,  A  =  2.  A  =  250 


rfenc  fc  Pi  =  25,  A  =  *>.  A  =  »> 


115 


We  remark  that  «e  also  considered  other  variants  of  the  PSUP  iteration.  In  experiments 
with  degrees  m  =  6  and  10  the  performance  of  PSUP  was  essentially  the  same.*  Moreover,  as 
we  noted  in  Section  3,  a  variant  of  the  GMRES/Rednced-PSUP  in  which  the  PSUP  polynomial 
is  constrained  to  be  bounded  in  modulus  by  one  on  the  set  of  deleted  eigenvalue  estimates 
displayed  about  the  same  behavior  as  the  unconstrained  version.  Similarly,  we  tested  LSQR 
(16),  a  stabilized  version  of  CGN,  and  fonnd  that  its  performance  was  nearly  identical  to  CGN. 

5.  Cmthriow 

The  GMRES  and  PSUP  methods  are  iterative  methods  that  are  optimal  in  the  class 
of  polynomial-based  methods  with  respect  to  the  Euclidean  or  loo  norms  respectively,  for 
arbitrary  nonsingular  linear  systems.  Few  linear  systems  in  which  the  coefficient  matrix  is 
either  symmetric  or  definite  (or  both),  these  types  of  methods  are  effective  solution  techniques 
(3, 5).  bn  particular,  they  are  superior  to  solving  the  normal  equations  by  the  conjugate  gradient 
method.  In  the  results  of  Section  4,  the  methods  based  on  polynomials  in  the  coefficient  matrix 
are  not  dramatically  superior  to  CGN,  especially  for  systems  that  are  both  highly  nonsymmetric 
and  highly  indefinite.  GMRES  appears  to  be  a  more  effective  method  than  PSUP. 

We  note  that  the  best  results  for  other  classes  of  problems  depend  strongly  on  precondi¬ 
tioning.  We  used  the  discrete  Uaplarian  as  a  preconditiooer  in  our  experiments,  and  the  large 
iteration/ work  counts  in  the  results  show  that  this  is  not  a  good  choice  for  the  given  mesh  sice 
when  the  coefficients  in  the  differential  operator  are  large.  We  believe  that  improvements  in 
preconditioners  are  needed  to  handle  this  class  of  problems. 


•  haaoeMi  with  degree  1C,  we  were  enable  to  geaoate  the  polynomial  coefficient*.  We  believe  the  choice  of 
the  powers  rfm  bail  fractions  makes  (U)  31  conditioned  for  large  m;  see  |19).  In  addition,  the  rngUmratitien 
based  on  Horner1!  rale  nay  softer  from  instability  for  luge  m. 


-441- 


116 


References 

[1]  A.  Bayliss,  C.  I.  Goldstein  and  E.  Turkel,  An  iterative  method  for  the  Helmholtz  equation, 

Journal  of  Computational  Physics,  49  (1983),  pp.  443-457. 

[2]  J.  H.  Bramble  and  J.  E.  Pasciak,  Preconditioned  iterative  methods  for  nonselfadjoint 

or  indefinite  elliptic  boundary  value  problems,  H.  Kardestuncer  ed.,  Unification 
of  Finite  Element  Methods,  Elsevier  Science  Publishers,  New  York,  1984,  pp. 
167-184. 

[3]  R.  Chandra,  Conjugate  Gradient  Methods  for  Partial  Differential  Equations,  Ph.D.  Thesis, 

Department  of  Computer  Science,  Yale  University,  1978.  Also  available  as 
Technical  Report  129. 

[4]  C.  de  Boor  and  J.  R.  Rice,  Extremal  polynomials  with  application  to  Richardson  iteration 

for  indefinite  linear  systems,  SIAM  J.  Sci.  Stat.  Comput.,  3  (1982),  pp.  47-57. 

[5]  H.  C.  Elman,  Iterative  Methods  for  Large,  Sparse,  Nonsymmetric  Systems  of  Linear 

Equations,  Ph.D.  Thesis,  Department  of  Computer  Science,  Yale  University, 
1982.  Also  available  as  Technical  Report  229. 

[6]  H.  C.  Elman,  Y.  Saad  and  P.  E.  Saylor,  A  Hybrid  Chebyshev  Krylov-Subspace  Method 

for  Nonsymmetric  Systems  of  Linear  Equations,  Technical  Report  YALEU/DCS/ 
TR-301,  Yale  University  Department  of  Computer  Science,  1984.  To  appear  in 
SIAM  J.  Sci.  Stat.  Comput. 

[7]  R.  Fletcher,  Conjugate  gradient  methods  for  indefinite  systems,  G.  A.  Watson  ed., 

Numerical  Analysis  Dundee  1975,  Springer- Verlag,  New  York,  1976,  pp.  73-89. 

[8]  L.  A.  Hageman  and  D.  M.  Young,  Applied  Iterative  Methods,  Academic  Press,  New  York, 

1981. 

[9]  M.  R.  Hestenes  and  E.  Stiefel,  Methods  of  conjugate  gradients  for  solving  linear  systems, 

Journal  of  Research  of  the  National  Bureau  of  Standards,  49  (1952),  pp.  409-435. 

[10]  E.  Hille,  Volume  II:  Analytic  Function  Theory,  Blaisdell,  New  York,  1962. 

[11]  K.  Ito,  An  Iterative  Method  for  Indefinite  Systems  of  Linear  Equations,  Technical  Report 

NAS1-17070,  ICASE,  April  1984. 

[12]  T.  Kerkhoven,  On  the  Choice  of  Coordinates  for  Semiconductor  Simulation,  Technical 

Report  RR-350,  Yale  University  Department  of  Computer  Science,  1984. 

[13]  T.  A.  Manteuffel,  Adaptive  procedure  for  estimation  of  parameters  for  the  nonsymmetric 

Tchebychev  iteration,  Numer.  Math.,  31  (1978),  pp.  187-208. 

[14]  - ,  The  Tchebychev  iteration  for  nonsymmetric  linear  systems,  Numer.  Math.,  28 

(1977),  pp.  307-327. 

[15]  J.  A.  Meijerink  and  H.  A.  van  der  Vorst,  An  iterative  solution  method  for  linear  systems  of 

which  the  coefficient  matrix  is  a  symmetric  M-matrix,  Math.  Comp.,  31  (1977), 
pp.  148-162. 

[16]  C.  C.  Paige  and  M.  A.  Sanders,  LSQR:  An  algorithm  for  sparse  linear  equations  and 

sparse  least  squares,  ACM  Trans,  on  Math.  Software,  8  (1982),  pp.  43-71. 

[17]  C.  C.  Paige  and  M.  A.  Saunders,  Solution  of  sparse  indefinite  systems  of  linear  equations, 

SIAM  J.  Numer.  Anal.,  12  (1975),  pp.  617-629. 

[18]  Y.  Saad,  Iterative  solution  of  indefinite  symmetric  systems  by  methods  using  orthogonal 

polynomials  over  two  disjoint  intervals,  SIAM  J.  Numer.  Anal.,  20  (1983),  pp. 
784-811. 

[19]  - ,  Least  squares  polynomials  in  the  complex  plane  with  applications  to  solving 

sparse  nonsymmetric  matric  problems,  Technical  Report  276,  Yale  University 
Department  of  Computer  Science,  June  1983. 


-442- 


117 


[20]  Y.  Saad  and  M.  H.  Schultz,  GMRES:  A  Generalized  Minimal  Residual  Algorithm  for 

Solving  Nonsymmetric  Linear  Systems,  Technical  Report  254,  Yale  University 
Department  of  Computer  Science,  1983. 

[21]  G.  D.  Smith,  Numerical  Solution  of  Partial  Differential  Equations:  Finite  Difference 

Methods,  Oxford  University  Press,  New  York,  1978. 

[22]  D.  C.  Smolarski  and  P.  E.  Saylor,  Optimum  Parameters  for  the  Solution  of  Linear 

Equations  by  Richardson’s  Iteration,  May  1982.  Unpublished  manuscript. 

[23]  R.  L.  Streit,  An  Algorithm  for  the  Solution  of  Systems  of  Complex  Linear  Equations  in  the 

loo  Norm  with  Constraints  on  the  Unknowns,  1983.  Submitted  to  ACM  Trans, 
on  Math.  Software. 

[24]  - ,  Solution  of  Systems  of  Complex  Linear  Equations  in  the  loo  Norm  with  Con¬ 

straints  on  the  Unknowns,  Technical  Report  83-3,  Systems  Optimization  Labora¬ 
tory,  Stanford  University  Department  of  Operations  Research,  1983.  To  appear 
in  SIAM  J.  Sci.  Stat.  Comput. 

[25]  P.  N.  Swarztrauber,  The  methods  of  cyclic  reduction,  Fourier  analysis  and  the  FACR 

algorithm  for  the  discrete  solution  of  Poisson’s  equation  on  a  rectangle,  SIAM 
Review,  19  (1977),  pp.  490-501. 

[26]  M.  A.  Saunders,  H.  D.  Simon,  and  E.  L.  Yip,  Two  Conjugate-Gradient-Type  Methods  for 

Sparse  Unsymmetric  Linear  Equations,  Technical  Report  ETA-TR-18,  Boeing 
Computer  Services,  June  1984. 

[27]  J.  H.  Wilkinson,  The  Algebraic  Eigenvalue  Problem,  Oxford  University  Press,  London, 

1965. 


-443- 


Extremals  And  Zeros  In  Markov  Systems 
Are  Monotone  Functions  Of  One  Endpoint 

R.  L.  Streit 


-445- 


Extremals  and  Zeros  In  Markov  Systems  are  Monotone 
Functions  of  One  Endpoint 

ROY  L.  STREIT 


Abstract .  The  synthesis  of  optimum  field  patterns  for 
discrete  linear  antenna  arrays  leads  naturally  to  the  study 
of  the  behavior  of  generalized  Chebyshev  polynomials  as  a 
function  of  one  endpoint  of  the  interval  of  definition.  Of 
particular  importance  in  this  application  is  the  variation  of 
the  zeros  and  the  extreme  points  of  the  generalized  Chebyshev 
polynomials  as  the  left-hand  endpoint  of  the  interval  is 
shifted  to  the  right.  All  the  zeros  and  all  the  extreme 
points  of  the  classical  Chebyshev  polynomials  defined  on  the 
intervals  [t,  b]  are  strictly  increasing  functions  of  t.  In 
Haar  Systems,  this  property  does  not  hold,  but  in  Markov 
Systems  with  unit  element  it  does.  An  apparently  new  extremal 
property  in  Haar  Systems  is  proved  and  then  used  to  show  that 
in  every  Haar  System  the  first  zero  must  be  an  increasing 
function  of  t.  If,  in  addition,  the  Haar  System  has  a  unit 
element,  then  the  first  zero  must  be  strictly  increasing. 

INTRODUCTION  AND  MOTIVATION 

Visualize  any  number  M  of  fixed  points  in  the  plane,all 
colllnear  and  spaced  symmetrically  about  the  center  of  the 
shortest  line  segment  containing  all  the  points.  Take  the 
origin  to  be  the  center  of  this  smallest  line  segment.  If 
each  of  these  points  is  taken  as  the  position  of  a  sensor  of 
an  antenna  array,  then  the  directional  response  of  this  array 
is  directly  proportional  to  the  absolute  value  of 

387 


-447- 


THEORY  OF  APPROXIMATION 

N 

P(u)  ■  E  a.  cos  L  u  ,  0  £  u  <_  ir  ,  (1) 

k-1  K  K 

where  K  ■  ^  is  «  constant  multiple  of  the 

distance  of  the  k-th  point  to  the  right  of  the  origin,  the 
variable  u  can  be  regarded  as  an  angle  measured  from  a 
normal  to  the  line  of  points,  and  the  coefficients  a^  are 
any  real  constants.  The  coefficients  a^  are  the  design 
parameters  and  are  chosen  to  enhance  the  directionality  of 
the  array. 

Usually,  the  direction  perpendicular  to  the  line  of 
points  is  the  desirable  direction  and  all  other  directions 
are  of  less  Interest.  Thus,  coefficients  a^  are  chosen 
so  that  P(u)  has  its  largest  magnitude  near  or  at  the 
point  u  *  0  ,  while  keeping  P(u)  as  small  as  possible 
in  magnitude  elsewhere.  With  this  objective  in  mind,  the 
u  domain  is  split  into  two  parts:  the  "mainlobe"  region 
[0,  UqJ  and  the  "sldelobe”  region  [uQ  ,  irj  .  The  coef¬ 
ficients  a^  are  defined  to  be  optimal  if  and  only  if  the 
ratio  of  Lot  norms 

11  P<U)  11  [0,  uQ]  *  11  P(u)  lf  [uQ,  ir]  (2) 

is  the  largest  possible  for  any  coefficient  set.  Theorem  2 
of  this  paper  gives  conditions  under  which  this  ratio  is 
maximized  by  minimizing  the  denominator  independently  of 
the  numerator.  That  is,  if  the  set  of  cosines  in  (1)  forms 
a  Haar  System,  then  the  optimal  coefficients  are  propor¬ 
tional  to  the  coefficients  of  the  generalized  Chebyshev 

388 


-448- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


polynomial  for  the  Interval  [u^,  it]  .  With  these  optimal 
coefficients,  the  peak  point  of  a  sldelobe  is  just  an  extreme 
point  of  a  uniform  function  approximation. 

The  ratio  (2)  is  therefore  maximized  as  a  function  of 
the  parameter  Uq  dividing  the  malnlobe  and  the  sldelobe 
regions.  In  engineering  applications,  it  is  Important  to 
know  how  this  ratio  changes  with  Uq  and  how  this  change 
shows  up  in  the  actual  field  pattern  (1).  Theorem  3  gives 
necessary  and  sufficient  conditions  for  the  maximum  value  of 
the  ratio  (2)  to  be  a  strictly  Increasing  function  of  Uq  , 
while  Theorem  1  gives  conditions  under  which  all  the  zeros 
and  all  the  sldelobes  of  the  optimal  field  pattern  shift 
strictly  to  the  right  with  increasing  Uq  .  Finally,  since 
the  first  zero  of  P(u)  is  sometimes  used  as  a  measure  of 
the  malnlobe  region  [0,  u^]  ,  Theorem  4  gives  weaker  con¬ 
ditions  under  which  this  measure  is  also  strictly  increasing 
with  increasing  uQ  . 

Pokrovskil  [1]  studied  this  problem  in  detail,  but  for 
equispaced  sensors  only.  In  the  equlspaced  case,  a  transfor¬ 
mation  of  variables  reduces  the  problem  to  a  study  of  ordinary 
polynomials.  An  explicit  solution  to  the  maximization  of  the 
ratio  (2)  for  equispaced  sensors  was  given  by  Dolph  [2] .  A 
generalization  of  Dolph' s  method  to  symmetrically  spaced  sen" 
sors  can  be  found  in  Streit  [3],  [4]. 

THE  MATHEMATICS 

Let  T  (u)  =  cos  [n  cos  ^  ul  be  the  Chebyshev  poly- 
n 

nomial  of  degree  n  on  the  closed  Interval  [-1,  1]  .  Then 
the  Chebyshev  polynomial  on  the  closed  interval  [t,  b]  is 
just 


389 


-449- 


THEORY  OF  APPROXIMATION 


T  /  2  b  +  t  \ 

• 

which  has  the  a  zeros  (counting  from  left  to  right) 

V0  ■  f  [i-  “»(■>-  k  +  i)ir] 

+  i  f1  +  “•  f  - k  +  t);]  • 


k  *  1  |  i  •  •  |  n  | 


and  the  (n  +  1)  extremal  points 

xk<t>  “  f  “  cos  (n  -  k  +  1) 

+  y  J^l  +  cos(n  -  k  +  l) 
k  ■  1  ,  .  .  .  ,  n  +  1  ■ 

By  inspection,  except  for  x^^t)  ,  all  the  zeros  and  all 

the  extremals  are  strictly  increasing  continuous  functions  of 
the  left-hand  endpoint  t  .  Theorem  1  extends  this  property 
to  more  general  spaces. 

Let  C[a,  b]  be  the  linear  space  of  all  real  valued  and 

continuous  functions  on  the  closed  finite  interval  [a,  b]. 

Melnardus  [5]  defines  the  finite  or  infinite  sequence  of 

linear  subspaces  V  ,  n  ■  0  ,  1,  2,  .  .  .  of  C[a,  b]  to 
n 

390 


-450- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


be  a  Haar  System  provided  it  has  the  three  properties: 

i.  V  CV 

n  n+1 

ii.  The  dimension  of  V  is  n  +  1 


ill.  V  satisfies  the  Haar  condition,  l.e.,  f 
n 

has  at  most  t 


CV 


n 

zeros 


and  f  t  0  implies  that  f 
in  [a,  b]. 

Define 

n 

by  restricting  every  function  in  V  to  the  Interval 

[t,  b]  C  [a,  b].  Thus,  VQ(t)  forms  a  Haar  System  for  each 

tC[a,  b).  For  f  C  C[a,  b],  define 


V  (t)  to  be  that  linear  subspace  of  C[t,  b]  obtained 
n 


f  ||  a  -  max  |  f  (x)  |  ,  jd.mt£C  [a,  b]  , 

&  xCez 


and 


p  (f;  t)  -  min  ||  f  -  g  . 

"  8CVn(t)  lt’  b] 


A  Haar  System  with  unit  element  is  a  Haar  System  for  which 

V  contains  the  constant  functions.  Zlelke  [6]  proves  the 
o 

following  lemma  in  a  more  general  form. 


Lemma  1.  If  Vq  is  a  Haar  System  with  unit  element  on  the 
closed  Interval  [a,  b],  and  if  f  C  V  \V  .  ,  then  there 

l*  It-1 

exist  at  most  (n  +  1)  points  x^  C  (*•  b]  such  that  f  is 
strictly  monotone  on  each  Interval  [x^,  •  k  *  1  , 

2,...,  n  +  1  . 


Define  (x)  =  1  for  all  x  and  t  .  The  following 

lemma,  due  to  Meinardus  [7],  [5],  is  an  immediate  consequence 
of  de  la  Vallee  Poussin's  Theorem  and  Lemma  1. 

391 


-451- 


THEORY  OF  APPROXIMATION 


Lemma  2.  For  a  Haar  System  V  with  unit  element,  the 

(t)  n 

functions  S  satisfying  the  conditions 
n 


a.  S' 


Cvn(t)  , 


d.  s< t} (b)  - 


[t.b]  A 
;  t)  -  l 


have  the  following  properties: 
1.  S^  is  uniquely 


1.  '  is  uniquely  defined  for  each  n  0  and 
t  £  [a,  b). 

2.  S^  possesses  precisely  n+1  extremal  points 
x^t)  (k  ■  1,2,...,  n  +  1>  in  the  interval  It,  b]. 
The  points  t  and  b  are  extremal  points,  and 
arranging  the  points  in  increasing  order 

t  -  x1(t)  <  x2(t)  <  .  .  .  <  xQ(t)  <  x^ft)  *  b  , 
ve  have  the  alternating  property 

(*k(t))  +  (Xkfl(t))  “  0  (k  "  1»2 . n>* 

3.  S^^ls  strictly  monotone  (Increasing  or  decreasing) 

in  x^t)  <i<  xie4-i^t)  (k  -  1,  2 . nj. 


Lemma  1  does  not  imply  that  V  ’  is  a  Haar  System  if 

n 

V  contains  only  continuously  differentiable  functions.  Let 

tl 

Vq  ■  {1}  and  ■  {1,  h)  where  h  is  any  continuously 
differentiable  strictly  increasing  function  on  [a,  b]  that 
haa,  say,  IS  inflection  points  in  (a,  b).  Then  Vp,  is  a 
continuously  differentiable  Hear  System,  but  every  function 
in  V^'  has  IS  zeros  in  (a,  b). 


-452- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


The  system  Vq  Is  defined  here  to  be  a  Markov  System 
with  unit  element  on  the  closed  Interval  [a,  b]  if  V  is 
a  Haar  System  of  functions  continuous  on  [a,  b] ,  having  con¬ 
tinuous  derivative  on  the  open  Interval  (a,  b),  and  with  the 
property  that  the  spaces  V^J  spanned  by  the  derivatives  of 

functions  in  V  form  a  Haar  System  on  (a,  b).  There- 
n 

fore,  f .€  V  '  and  f  £  0  implies  that  f  has  at  most 

n  -  1  zeros  in  (a,  b).  Let  V  '(t)  be  the  restriction  of 

n 

V  '  to  (t,  b) .  We  now  prove  an  extension  of  Lemma  2  to 
n 

Markov  Systems  with  unit  element. 


Theorem  1.  Let  V  be  a  Markov  System  with  unit  element  on 

n  ft) 

the  closed  interval  [a,  b].  Then  the  functions  S  satisfy- 

n 

ing  conditions  a.  through  d.  of  Lemma  2  also  have  the  follow¬ 
ing  additional  properties: 

4.  Each  function  x^t)  (k  -  1,  ...  ,  nils  a  continuous 
and  strictly  monotonically  increasing  function  of  t. 

5.  The  zeros  z.  (t)  (k  -  1,  ....  n)  of  S^  , 

K  n 

arranged  in  the  Increasing  order 

t  <  z^ ft)  <  z2(t)  <  ...  <  zn_1(t)  <  zn(t)  <  b  , 

are  all  continuous  and  strictly  monotonically 
increasing  functions  of  t. 

6.  S^C\x)is  strictly  monotone  (increasing  or  decreasing) 
in  the  interval  a  <  x  <  »2 (t)  . 

Proof .  To  prove  property  6,  note  that  Rolle's  Theorem  implies 

that  (x)]  has  exactly  n-1  zeros  in  the  interval 

[x2(t),  zn(t)),  and  so  must  have  none  in  the  interval 

[a,  x,(t)),  because  any  function  in  V'  has  at  most  n-1 

2  (t)  n 

zeros  in  [a,  bl,  Thus,  must  be  monotone  in  the 

Interval  [a,  x2(t)].  The  continuity  of  x^t)  and  *k<t) 

(k  ■  1,  2,  ...  ,  n)  follows  from  a  remark  in  Meinardus 


393 


-453- 


THEORY  OF  APPROXIMATION 


[5,  p.  85J. 

Now,  fix  t 

C  [a,  b), 

■  By 

continuity,  there 

exists  6  >  0 

such  that 

for  each 

0  < 

t  <  6 

,  the  set* 

I^Ce)  defined 

by 

S 

rt 

V£  ♦  E>] 

.  if 

*k<t) 

<  x^(t  +  e) 

Ve)  ■  | 

l\W| 

.  if 

xfc(t) 

■  xt<t  +  « 

[*k(t  +  £> 

•  *k(t>] 

,  if 

V*) 

>  x^(t  +  e) 

(k  -  1,  2,  ...  ,  n  +  1) 


are  pairwise  disjoint:.  Put 

S(x;C)  -  S^t> (x)  -  S<t+E)(x)  . 

Because  of  property  6,  S(x;e)  has  no  zero  in  I^(e)  and 
precisely  one  zero  in  each  of  the  remaining  intervals.  Thus 
the  Intervals  l2(e),  ...  ,  In4.1(e)  contain  all  the  zeros 
of  S(x;c)  .  Because  of  property  2 , 

Xj(t)  ■  t  <  t  +  c  ■  x^(t  +  e)  . 

Let  j  <  n  +  1  be  the  smallest  integer  such  that  there 
exists  t  <  d  ,  6  >  0  ,  and  x^ (t  +  i)  x^ (t)  .  If  strict 
inequality  holds,  then  the  interval (xJ1(t  +  £),Xj(t  +  6)) 

is  disjoint  from  the  intervals  I  (t) .  I  (g) 

but  must  contain  a  zero  of  S(x;£)  .  On  the  other 
hand,  if  x^ (t  +  6)  ■  x^(t)  ,  then  S' (x;§)  has  a  zero  at 
Xj(t)  .  But  Rolle's  Theorem  already  gives  S'(x;£)  at 
least  n-1  zeros,  one  between  each  consecutive  pair  of  zeros 


394 


-454- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


of  S(x;S)  .  Therefore,  S'(x;6)  contradicts  the  hypothesis 

on  V' (t)  .  Hence,  J  ■  n  +  1  and  property  4  follows, 
n 

Finally,  property  5  is  an  immediate  consequence  of  property  4, 
for  otherwise  the  function  S(x;£)  will  have  too  many  zeros. 
This  completes  the  proof. 

The  next  object  is  to  weaken  the  hypotheses  of  Theorem  1, 
specifically,  by  dropping  the  unit  element  and  the  differen¬ 
tiability  conditions.  One  method  of  achieving  this  end  leads 
to  Theorem  2.  First,  define  |h^  ,  h^  ,  ...  ,  hR|  to  be  a 
basis  for  the  Haar  System  if  j h^  ,  h^  ,  . . .  ,  h^  } 

is  a  basis  for  V.  ,  k  ■  0,  1,  . . .  ,  n  .  Also,  the  functions 

(t )  K 

are  defined  above  only  for  Haar  Systems  with  unit  element. 

For  the  general  Haar  System,  define  S^  =  c(t)-(h  +  g*)  , 

n  n 

where  g*  C  Vn_^(t)  and  the  constants  c(t)  are  uniquely 
defined  by 


[c(t)]'1 


II  hn  +  8*  II  rt  bl  “  “in  II  hn  +  8 

8CVl(t) 


[t,b] 


Theorem  2.  Let  V  be  a  Haar  System  on  the  closed  interval 
- — — —  n 

[a,  b]  .  Let  |hg,  h^,  ...  ,  hn j be  a  basis  for  Vr  .  For 
fixed  r  and  s  satisfying  a£r£s£t<b  ,  define 


M„<c> 


max 

8^Vi 


Hhn°°  +  8 II  fr,  si 
Hhn(x)  +gH  [t,b] 


Then 

a.  the  ratio  of  norms  is  maximized  by  the  best 

approximation  to  hfl  on  [t,  b],  namely  8*^vn_i  » 


395 


-455- 


THEORY  OF  APPROXIMATION 


b.  M  (t)  -  ||  S(t)  ||  ,  . 

n'  11  n  11  [r  ,8] 


c.  M^(t)  Is  a  continuous  increasing  function  of  t 

If,  in  addition,  V  has  a  unit  element,  then  also 
n  ’ 

b'.  Mn(t)  -  |  S^t) (r)  |  . 

Proof.  Suppose  hCV  .  h  $  .  and 

- ~  n  n 

II-  Ni,..i  ii»‘°  II  i„.i 

II-  II  lt,b)  II  SnC)H  [t,b] 

There  exists  a  constant  k  >  0  such  that 


(3) 


II  kh  || 


.(t) 


[r,s]  11  n  11  [r.s]  , 

so  that, because  of  (3)  , 

H-  ll  (t,b)  <  Ill’ll 


(4) 


n  11  [t,b]  * 


:(t) 


The  alternating  properties  of  S  (de  la  Vallee  Poussin's 

n 

Theorem)  imply  that  both  of  the  functions 


S(t)(x)  +  kh(x)  C  V  (t) 
n  —  n 


(5) 


have  n  zeros  in  the  open  interval  (t,b).  But  because  of 
(4),  one  of  these  functions  also  has  a  zero  in  the  interval 
[r,  s].  Thus,  one  of  the  functions  (5)  violates  the  Haar 
condition  on  V 


This  contradiction  establishes  that  either 

ft)  n  ft) 

h  =  S  or  that  (3)  is  an  equality.  Either  way.  S'  maxi- 
n  n 

mizes  the  ratio  of  norms  and  part  (a)  is  established.  Part(b) 

396 


-456- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


follows  easily  from  pare  (a).  The  continuity  of  M  (t)  follows 

n 

from  part  (b) .  To  prove  that  MR(t)  is  increasing,  suppose 

there  exist  $  >  a  >  s  such  that  M  (8)  <  M  (a)  .  Then  for 
—  n  n 

some  constant  k  4  0, 


Hence,  both  of  the  functions 

kS^(x)  +  S(0t)(x) 
n  —  n 


have  n  zeros  on  the  interval  [8,  b] .  But  one  of  them  also  has 

a  zero  in  the  interval  [r,s],  contrary  to  the  assumption  that 

V  is  a  Haar  System.  Thus,  part  (c)  is  established.  If  V 
n  n 

has  a  unit  element,  then  Lemma  1  guarantees 

that  is  monotone  in  [r,s],  so  that  (b1)  must  be  true, 

n 

This  completes  the  proof. 

Theorem  3.  Under  the  assumptions  of  Theorem  2,  for  all 

8  >  a  >  8  , 

M  (a)  -  M  (8)  (6) 

n  n 

if  and  only  if 

S<a)(x)  -  S<6)(x)  <  7) 

n  n 

for  each  x  e [a,b]  . 

397 


-457- 


THEORY  OF  APPROXIMATION 


Proof .  If  (7)  holds,  then  (6)  follows  from  Theorem  2.  Suonose 

then  that  (6)  holds,  but  that  (7)  does  not.  Let  x..  (t)  denote 

(t)  L 

the  (n  +  l)st  extremal  point  of  S  counted  from  the 

n 

right-hand  endpoint  to  the  left.  If  h  *  x^(a)  >  a  ,  then 

,  so  that  (7)  holds  if 


,<«>  =  ,(t) 


S'”7  =  for  all  t  e  [a,  h] 

n  n 

8  <  h  .  Hence  without  loss  of  generality,  we  may  take 


x^a)  -  a  <  8 


x1(8) 


Also,  because  x^(t) 


and 


Mn(t) 


is  continuous 
is  monotone,  it  may  be  assumed  without  loss  of 


generality  that  x1 (8)  is  strictly  less  than  the  first  zero 
1  ""  (a) 

Sn  (6)  • 


of 


S(a)  and  that  S^(8)is  of  the  same  sign  as 

n  n  ft)  ft} 

Define  the  functions  Tv  7 (x)  =  S'  ;(x)-c(t)  . 

n  n 

coefficient  of  h  is  always  unity.)  Because  TVPy  i  T 


(Thus ,  the 
(8)  *  T(a) 


n 


the  function  T(x)  =  (x)  -T^(x)  lies  in  V 

n  n 

cannot  be  Identically  zero.  Because  [8,  b] 

[a,  b]  and  T(x)  2  0  , 


,  and 
n-1 

is  a  subset  of 


.(8) 


[8,  b] 


,<c0 


[a,  b] 


(8) 


and  because  of  (6)  , 


t!  xi6)  II 


[r,s] 


II  T?’  II 


[r,s] 


(9) 


Now  if  |t^(8)  |  $  |T^a)(6) |  ,  then  T(x)  has  n  zeros 

in  [8.  b]  because  of  de  la  Vallee  Poussin's  Theorem  and 

(8),  contradicting  T(x)CV[i_1  and  T(x)  j£  0  .  Therefore, 

|t(8)(8)|  >  |  T^a> (8) |  and  T(x)  has  only  n-1  zeros  in 

[8,  b].  However,  T(x)  has  another  zero  in  [r,  8)  because 

of  the  Intermediate  Value  Theorem,  (9),  and  the  fact  that 

T(6)(8)  and  T^  (8)  are  of  the  same  sign.  Thus,  T(x)eV 
n  n  n-i 

and  T(x)  t  0  is  a  contradiction.  This  completes  the  proof. 


398 


-458- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


Corollary .  Under  the  assumptions  of  Theorem  2,  if  Vr  has 
a  unit  element,  then  also 

c*.  M  (t)  Is  strictly  Increasing  In  t  . 

D 

Proof .  S(a)  =  S<6)  cannot  occur  for  a  i*  6  because  o  and 
8  must  be  extremal  points  of  S ^  and  S ^  ,  respectively. 

The  next  theorem  is  the  weakened  form  of  Theorem  1  that 
was  sought  earlier. 

Theorem  4.  Let  be  any  Haar  System,  and  let  {hQ,  hj .  hn> 

be  a  basis  for  the  system.  Let  z^t)  be  the  smallest  zero 
in  the  interval  [t,  b]  of  .  Then  z^t)  is  a  mono- 

tonically  increasing  function  of  t  .  Furthermore,  if  vn 
has  a  unit  element,  then  z^t)  is  strictly  monotone. 

Proof .  Consider  first  the  case  for  without  a  unit 

element.  Suppose  there  exists  S  >  ci  >  a  such  that 
zl(B)  <  z^o)  .  Therefore,  S <°°  t  S<$)  and  by  Theorem  3, 


III*  »M 
II  1°  II  Ml  ' 


Thus,  there  ia  a  constant  k  l1  0  such  that 

ll  n  w  -  ll  l0)  li  o 


and 


"  [3,b]  <  H  SnJ  "  [o,b] 


(a) 


(10) 


399 


-459- 


THEORY  OF  APPROXIMATION 


Because  z^(0)  <  z^(a)  ,  de  la  Vallee  Poussin's  Theorem 
guarantees  that  both  the  functions 


Cll) 


have  n  zeros  on  [0,  b],  while  (10)  guarantees  that  one  of 
them  has  a  zero  at  x  ■  a  -  Therefore,  one  of  the  functions 
(ll)violates  the  Haar  condition  of  V  .  In  the  case  where 

II 

VQ  has  a  unit  element,  3  >  a  guarantees  by  Lemma  2  that 
Sn  ^  Sn^  *  Tlie  8uPP°sition  that  (6)  5.  z^tci)  leads 
to  a  contradiction  in  the  same  manner  as  before.  This  com¬ 
pletes  the  proof. 

Finally,  we  note  that  Theorem  2  fails  if  V  does  not 

n 

satisfy  the  Haar  condition. 

Example.  The  linear  space  spanned  by  the  functions  {l,  sin  x} 
on  the  interval  [0,  IT]  is  not  Haar.  For  [r,  s]  -  [0,  tt/2 ] , 


!!  ro.  tt/2i 


sin  x  +  a 


[tt/2,  it] 


for  all  constants  a,  including  the  constant  of  best  approx¬ 
imation  to  sin  x  on  the  internal  [it/2,  it]  ,  namely, 
a  ■  1/2  .  When  the  Interval  [r,  s]  is  replaced  by  the 
point  x  ■  tt/6  ,  then 


ain  x 


II 


o- 

[tt/2, it] 


sin  x  -  T  ||  fa/6) 
8ln  *  "  2  H  [TT/2, IT] 


Thus,  the  results  of  Theorem  2  do  not  hold  and  M  (t)  is 

n 

not  achieved  by  minimizing  the  denominator. 


400 


-460- 


EXTREMALS  AND  ZEROS  IN  MARKOV  SYSTEMS 


Acknowledgment .  I  would  like  to  thank  the  referee  for  sug¬ 
gesting  a  helpful  change  In  notation,  and  for  pointing  out  the 

articles  by  Zielke  and  Pokrovskli. 

REFERENCES 

[1]  V.  L.  Pokrovskii,  "On  a  class  of  polynomials  with  ex¬ 
tremal  properties,"  AMS  Translations,  vol.  19,  1962, 
pp  199-219. 

[2]  C.  L.  Dolph,  "A  current  distribution  of  broadside  arrays 
which  optimizes  the  relationship  between  beam  width  and 
side-lobe  level,"  Proc.  IRE  Waves  Elections,  vol.  34, 

June  1946,  pp  335-348. 

[3]  R.  Streit,  "Sufficient  conditions  for  the  existence  of 
optimum  beam  patterns  for  unequally  spaced  linear  arrays," 
IEEE  Trans.  Antennas  Propag. ,  vol.  AP-23,  no.  1,  Jan  1975. 
pp  112-115. 

[4]  R.  Streit,  "Optimized  symmetric  discrete  line  arrays," 
IEEE  Trans.  Antennas  Propag.,  (to  appear  November,  1975). 

[5]  G.  Melnardus,  Approximation  of  Functions:  Theory  and 
Numerical  Methods .  Springer-Verlag,  New  York,  1967. 

[6]  R.  Zielke,  "Alternation  properties  of  Tchehyahev-Sys terns 
and  the  existence  of  adjoined  functions,"  J.  Approxi¬ 
mation  Theory,  vol.  10,  1974,  pp  172-184. 

[7]  G.  Melnardus,  "Uber  Tschebyschef fsdie  Approximationen," 
Arch.  Rat.  Mech.  Anal.,  vol.  9,  1962,  pp  329-351. 


401 


-461- 


Concertina-Like  Movement 
In  The  Absence  Of  A  Chebyshev  System 

J.  T.  Lewis  and  R.  L.  Streit 


-463- 


Note 


Concertina-Like  Movement  in 
the  Absence  of  a  Chebyshev  System 

James  T.  Lewis* 

Mathematics  Department.  University  of  Rhode  Island.  Kingston.  Rhode  Island  02881.  USA 

AND 

Roy  L.  Streit 

Naval  Underwater  Systems  Center. 

New  London  Laboratory.  New  London.  Connecticut  06320.  USA 

Communicated  by  Oved  Shisha 

Received  October  30.  1981:  revised  February  4.  1982 


Introduction 

Meinardus  1 1,  p.  29]  defined  functions  S(x)  having  certain  oscillatory  and 
best  approximation  properties  on  an  interval  |a,  6|.  The  most  notable 
example  is  the  Chebyshev  polynomial  of  the  first  kind,  TK(x).  In  |2|,  Streit 
studied  the  dependence  of  5(x)  on  the  left  endpoint,  a,  of  the  interval  and 
discussed  an  application  to  the  design  of  linear  antenna  arrays.  The  depen¬ 
dence  on  the  endpoint  was  further  investigated  by  Zielke  [3]  who  obtained 
stronger  results.  We  will  summarize  briefly  some  of  the  theory  and  then 
present  an  example  to  settle  a  certain  question. 


Properties  of  S,(x) 

Let  [a,  6]  be  a  finite  real  interval,  n  a  positive  integer  and  A,  =  1,  h},...,  A„, 
/  real  continuous  functions  on  |o,  such  that  )  1,  A2,..., /i,|  is  a  Chebyshev 
system  of  degree  n  on  |a,  f>|  (i.e.,  ,  a(A(  has  at  most  n  -  1  zeros  in  |a,  b\ 

unless  a ,  =0,...,  a„  =  0).  Assume  also  that  (1,  A,„.„  hn,f)  is  a  Chebyshev 
system  of  degree  n  +  1  on  (a,  b].  Let  a  <  t  <  b  and  let  p,(x)  denote  the  best 

*  The  work  of  this  author  was  performed  while  he  was  a  summer  employee  of  the  Naval 
Underwater  Systems  Center,  New  London.  Connecticut.  U.S.A. 

364 


-465- 


CONCERTINA  LIKE  MOVEMENT 


365 


uniform  approximation  to  f(x)  on  |/,  A)  by  a  linear  combination  of  I. 
A2,...,  A„.  Then  |1,  p.  29],  f-p,  has  exactly  n  +  1  extremals  of  alternating 
sign  and  equal  magnitude  which  include  the  endpoints  a  and  b ,  and  /-  p,  is 
a  strictly  monotone  function  of  x  between  these  extremals.  Define 

S,{x)  =  ±[/(*)  ~  P,(x))/  max  |/(x)  -  p,(*)| 


where  the  sign  is  chosen  so  that  S,(A)  =  +1. 

If  1 1,  hn,f)  is  |l,jr,...,  x")  and  \a,  A]  =  j-1,  1 ),  then 


S,(x)=Tn 


I  lx 
il  -/ 


1  +/ 
1  -t 


)• 


Motivated  by  results  obtained  from  the  application  of  the  shifted  Chebyshev 
polynomials  to  linear  antenna  arrays,  Streit  ( 2 1  studied  for  the  general  case 
the  movement  of  the  zeros  and  extremals  of  S,  as  a  function  of  t.  In  [4| 
Zielke  showed  the  entire  graph  of  S,  moves  to  the  right  as  t  increases 
(concertina-like  movement)  except  possibly  the  extremal  points.  They,  too. 

must  move  to  the  right  if  the  derivatives  {AJ . A'./' |  form  a  Chebyshev 

system  of  degree  n  on  (a.  A).  Of  course,  the  right-hand  endpoint  of  the  graph 
stays  fixed  at  (A,  5, (A))  =  (A.  1).  We  summarize  the  known  properties  of  S,: 
For  each  t  such  that  a  <  /  <  A. 

(a)  S,  is  a  linear  combination  of  I.  A2 . h„.f 

(b)  max^Jt<<I|5,(A)|  =  1. 

(c)  The  best  uniform  approximation  to  S,  on  |r.  A)  by  a  linear 

combination  of  jl,A2 . A.)  is  0. 

(d)  S,( x)  has  n  +  1  extremals  of  alternating  sign  and  equal  magnitude, 
which  include  the  endpoints  t  and  A,  and  5,(x)  is  a  strictly  monotone 
function  of  x  between  the  extremals. 

(e)  S,(b)=  1. 

(f)  S,  satisfying  (a)-(c)  >s  unique. 

(g)  The  graph  of  S,  moves  to  the  right  as  /  increases  (except  for  the 

fixed  right-hand  endpoint);  i.e.,  a  <  /,  <  <  A,  a  in  [-1,1],  and  1  <  k  <  n 

implies  that  the  smallest  z  such  that  5,,(x)  =  a  for  k  distinct  points  in  (/,,  z) 
is  strictly  less  than  the  smallest  z  such  that  Sh(x)  =  afork  distinct  points  in 

l*i**  I* 


The  Example 

Proof  of  the  existence  of  S,  with  the  nice  properties  (a)-(g)  relies  heavily 
on  the  fact  that  {1,  A,,.-,  A,,/}  is  a  Chebyshev  system.  We  were  curious  as 


-466- 


366 


LEWIS  AND  STRE1T 


to  whether  a  system  could  give  rise  to  an  S,  satisfying  (a)-(g)  without  being 
a  Chebyshev  system.  Clearly  this  is  impossible  for  |1,/|  since  cj-c 2  is 
strictly  monotone  between  the  extremals  a  and  b  only  if  /  is  (and  hence 
jl,/}  forms  a  Chebyshev  system).  However,  we  did  construct  an  example 
)l,/t2,/}  which  we  now  present. 

Example.  Let  A2( x)  =  x,  f(x)  =  x}  and  |a,  b\  =  |— J,  1 1-  Then  { 1,  x, ) 
is  not  a  Chebyshev  system  on  j— j ,  1 )  since,  for  example,  p(x)  =  x{x2  —  jj) 
has  zeros  at  -j,  0,  j.  We  will  now  show  S,  exists  such  that  properties 
(a)-(g)  are  satisfied.  Letting  —  \  <  t  <  1,  E,(x)  =  x*  —  (a,  +  b,x)  and  using  /, 
x,,  1  as  a  reference  set  gives  the  equations 

E,(t)  =  t,-(al  +  blt)  =  d„ 

E,(x,)  =  x]  -  (a,  +  b, x,)  =  -d,,  ( 1 ) 

£,(1)=  1  -(a,  +  b,)  =  d,. 

Subtracting  the  third  equation  from  the  first  equation  gives  /  '  -  1  - 
b,(t  -  1 )  =  0,  i.e.,  b,  =  t1  +  t  +  1.  Now 


^-E,(x)  =  3ar2  —b,  —  3jt:  —  (/J  +  /  +  1)  =  0,  when  *  =  *,. 
at 

Hence,  x,  =  |(/2  +  /  +  1  )/3 1 in.  Substituting  x,  and  b,  into  Eqs.  ( 1 ),  one  could 
solve  uniquely  for  a,  and  d,  in  terms  of  t  and  observe  that  d,  >  0;  we  omit 
the  details.  Considering  dEJdx  and  using  t  >  -  i  we  see  £,(*)  is  strictly 
decreasing  in  [/,*,]  and  strictly  increasing  in  |x„l).  Hence,  the  charac¬ 
terization  theorem  guarantees  that  a,  +  b,x  obtained  from  solving  (1)  is  the 
unique  best  uniform  approximation  to  x*  on  |f,  1  ]. 

Then,  for  -J  </<  1,  S,(jc)  =  (\/d,)[x}  -  (a,  +  b,x)]  satisfies  (a)-(0- 
Now,  let  <  r2  <  1.  Since  x,  is  strictly  increasing  as  a  function  of  /, 

xh  <  xh.  Clearly  £,,(*)  -  Stj(x)  has  a  zero  in  (*,,,  x(j)  and  a  zero  at  x  =  1.  If 
Sf  -  S,2  has  no  other  zeros  in  (r2, 1  ],  then  (g)  will  be  satisfied.  Assume  the 
opposite;  then  by  Rolle’s  theorem  rf(Sl((x)  —  S,l(x)j/dx  has  at  least  two 
zeros,  say  z,  <  z2,  in  (f2,  1)  with  z2  >  x(i  ^  x_  I/2  =  Hence,  z2  — z,  which 
is  impossible  since  </|S(|(x)  —  Sh{x)]/dx  has  the  form  c,jc2+c2.  This 
completes  the  verification  of  (a)-(g)  for  the  example. 


References 

1.  G.  Meinardus.  “Approximation  of  Functions:  Theory  and  Numerical  Methods,"  Springer 
Verlag,  New  York,  1967. 


-467- 


CONCERTINA-LiKE  MOVEMENT 


367 


2.  R.  L.  Streit,  Extremals  and  zeros  in  Markov  systems  are  monotone  functions  of  one  end 
point,  in  “Theory  of  Approximation  with  Applications  (Proc.  Con f..  Univ.  Calgary. 
1975),”  pp.  387-401.  Academic  Press.  New  York,  1976. 

3.  R.  Zielke.  Concertina-like  movements  of  the  error  curve  in  the  alternation  theorem. 
Manacripta  Math.  22  (1977),  229-234. 


Printed  by  the  St  Catherine  Press  Ltd.,  Tempelhof  41,  Bruges,  Belgium 


-468- 


Limits  of  Chebyshev  Polynomials 
When  The  Argument  Is 
A  Ratio  of  Cosines 


R.  L.  Streit 


-469- 


Note 


Limits  of  Chebyshev  Polynomials  When  the 
Argument  Is  a  Ratio  of  Cosines 

Roy  L.  Streit 

Naval  Underwater  Systems  Center, 

New  London  Laboratory,  New  London.  Connecticut  06320,  U.SA. 

Communicated  by  Oved  Shis  ha 

Received  January  25,  1983 


Two  new  limits  involving  Chebyshev  polynomials  of  the  first  and  second  kinds 
are  given.  These  limits  are  useful  in  certain  engineering  applications.  The  proofs  are 
based  on  the  Mehler-Heine  theorem  for  Jacobi  polynomials. 


Let  P'°a'(x)  denote  the  Jacobi  polynomials.  It  is  evident  from  the 
representation  |I,  (4.21.2)] 

C ("  )(»  +  a+0+  iM«  +  *+  (i) 

that  P(„a  />l(x)  is  a  polynomial  of  degree  n  in  x  and  in  the  parameters  a  and  /?. 
Hence,  P{“  ^(x)  can  be  extended  to  all  complex  values  of  a,  P,  and  jc.  In  this 
note,  a  and  P  are  restricted  to  be  real  numbers. 

For  any  complex  number  x,  the  Mehler-Heine  theorem  1 1,  Theorem  8.1.1 1 
states  that 


lim  n-Pi"'8'  (cos—)  =  (jr/2)-  /„(*),  (2) 

»-oc  \  n  l 


where  Ja(x )  is  the  Bessel  function  of  the  first  kind  of  order  a  |1,  (1.71.1)]. 
Szego's  proof  of  (2)  actually  establishes  that 


lim 

n-*ao 


n 


-a  pia,&) 


o(n-l))  =  (j :/2)~°Ja(x). 


393 


-471- 


394 


ROY  L.  STREIT 


Consequently,  for  all  complex  x  and  y. 


lim  n~aPi“'t) 


I  COS(jc/rt) 

\  cos(y/n) 


=  (y  V'*’-/) 


(3) 


Like  the  Mehler-Heine  theorem,  this  result  holds  uniformly  for  x  and  y  in 
every  bounded  region  of  the  complex  plane. 

The  limit  (3)  has  interesting  special  forms  for  the  Chebyshev  polynomials 
T„(z)  and  Un(z)  of  the  first  and  second  kinds,  respectively.  Substituting  the 
identities 


and 


M- 


(2n)! 

2  2"(n!)2 


Uz), 


n>  I, 


/  2  \  1/2 
(— )  cos 


in  (3)  and  applying  Stirling's  formula  gives 

/  cos (x/n)  \ 


lim  T„  --  -  p  ^  =  cos  s/x^y2 
1-OL  "  \  cost y/n)  I  v 


Similarly,  substituting 


d(i/2.i/2)/.a_  (2n  +  2)\ 

rn  \z)—  22,,+  *((rt  +  l)!)2  U"'Z'' 


and 


(4) 


/  2  \,/J 

J\n(z)  =  (—  j  S'"  z 

in  (3)  and  applying  Stirling’s  formula  gives 


lim  n~'U  = 

-.-co  \  cos(y/n)  /  yjx1  -y1 


(5) 


These  limiting  forms  do  not  seem  to  be  mentioned  elsewhere  in  the  literature. 

A  result  similar  to  (4)  is  used  implicitly  in  an  antenna  design  application 
(2).  The  result  (5)  is  shown  in  (3j  to  be  intimately  related  to  the  so-called 


-472- 


LIMITS  OF  CHEBYSHEV  POLYNOMIALS 


395 


Kaiser-Bessel  window  in  digital  filter  design.  These  applications  require 
knowledge  of  the  cosine  transform  of  the  right-hand  side  of  (3),  which  is 
provided  by  a  special  case  of  Sonine’s  second  finite  integral  |4,  p.  376]  for 
a  >  —  In  particular, 

ft  v ■  ({SgSrH  <6> 


where  /,(z)  is  the  modified  Bessel  function  of  order  v.  Sonine’s  second 
finite  integral  diverges  for  a  =  —  however,  the  cosine  transform  of 
cos  ((*2  -y2),/2)  is  known  |5,  (871.2) |,  so  that 


lim  T„ 

H  —  CC 


(  cos(x/n) 
\  cos (y/n) 


cos  xC  d{. 


(7) 


It  is  evident  from  (6)  and  (7)  that  the  limiting  forms  have  finite  support  (i.e., 
are  bandlimited)  and,  thus,  are  of  exponential  type. 

An  extremal  property  of  cos((x2  —  j,2),/2)  in  the  space  of  functions  of 
exponential  type  is  given  in  |6|.  The  proof  is  based  on  a  theorem  in  |7|. 
Whether  or  not  the  limit  function  (3)  has  extremal  properties  in  this  space  is 
not  known  to  the  author. 


References 

1.  G.  SZEGO.  “Orthogonal  Polynomials."  4th  ed..  Vol.  23.  Amer.  Math.  Soc.  Colloq.  Pub!.. 
1978. 

2.  G.  J.  van  der  Maas.  A  simplified  calculation  for  Dolph-Tchebycheff  arrays.  J.  Appl. 
Phvs.  2S  (1)  (1954). 

3.  R.  L.  Streit,  A  two  parameter  family  of  weights  for  nonrecursive  digital  filters  and 
antennas.  IEEE  Trans.  Acousl.  Speech  Signal  Process.  ASSP-32.  108-118. 

4.  G.  N.  Watson,  “Theory  of  Bessel  Functions."  2nd  ed..  Cambridge  Univ.  Press. 
London/New  York,  1966. 

5.  G.  A.  Campbell  and  R.  M.  Foster.  “Fourier  Transforms  for  Practical  Applications." 
Van  Nostrand,  Princeton,  N.  J„  1948. 

6.  V.  Barcilon  and  G.  C.  Temes.  Optimum  impulse  response  and  the  van  der  Maas 
function.  IEEE  Trans.  Circuit  Theory  CT-19  (4)  (1972),  336-342. 

7.  R.  J.  Duffin  and  A.  C.  Schaeffer.  Some  properties  of  functions  of  exponential  type. 
Bull.  Amer.  Math.  Soc.  44  (1938),  236-240. 


Printed  by  the  St.  Catherine  Press  Ltd.,  Tempelhof  41,  Bruges,  Belgium 


-473- 


A  Routine  For  Numerical  Solution 
Of  Fredholm  Integral  Equations 

R.  L.  Streit  and  A.  H.  Nuttall 


-475- 


Abstract 


A  routine  is  presented  for  numerical  solution  of  Fredholm 
integral  equations  of  the  first  and  second  kind  and  for  solution  of 
characteristic  values  and  functions.  An  approximate  method  for 
solving  integral  equations  of  the  first  kind  by  means  of  integral 
equations  of  the  second  kind  is  presented  and  illustrated  with 
several  examples.  Application  to  simultaneous  determination  of 
several  characteristic  values  and  functions  of  a  kernel  is  also 
considered  and  documented  by  several  examples. 


-477- 


TM  No. 
TC-108-72 


INTRODUCTION 

The  interest  here  lies  in  obtaining  approximations  to  the  solution  gfc)  of  the 
Fredholm  integral  equation 

£  dj  K  3(3)  -  -f  (x) ,  «  «  x  <  l>, 

where  the  kernel  Kfojj)  and  function  TW  are  known.  If  X  and  -fW  are  non-zero, 
(1)  is  an  integral  equation  of  the  second  kind.  If  X  is  zero,  (1)  is  an  integral 
equation  of  the  first  kind.  If  H*)  is  zero,  (1)  is  a  characteristic  value  problem 
possessing  solutions  only  for  certain  values  of  X  ;  in  this  case,  we  express  (1)  as 

^  dj  K(x,u)  I3)  *  4*,  h),  a  <  X  <  b,  (2) 

where  fx,J  are  the  characteristic  values  of  the  kernel  and  are  the 

characteristic  functions  (assumed  to  be  of  unit  energy:  £%k4£|x)»  »)• 

PROBLEM  SOLUTION 

The  integral  on  the  left  side  of  (1)  is  approximated  by  any  of  numerous  integra¬ 
tion  rules  available,  such  as  the  Trapezoidal  rule,  Simpson's  rule.  Gauss  quadrature, 
etc.  That  is, 

b  ^ 

£  *1  KM  jty)  S  2Wj  (3) 

where  {1^}  are  the  weights  and  are  the  abscissas  (points)  of  the  particular 
integration  rule  adopted.  The  approximation  to  the  true  solution  of  0)  is 
denoted  by  9(Tj>  at  the  point  ^  .  This  approximation  is  obtained  by  utilizing 
(3)  in  (1)  and  requiring  that 

ixjK  (7,,$  jlfj)  -  X  j(«  +  Hi) ,  I  *  « *  N-  (4) 

3 


-479- 


TM  No. 
TC-108-72 


This  method  is  known  as  the  method  of  collocation.  Equation  (4)  constitutes  N 
flotations  in  the  N  unknowns  £  . 

We  define  four  matrices  F,  G,  B,  and  D  as 

Fr= 

ST-  --  3  ft.)] 

B  *  [*(*,?/>]  (n«w) 

I>  *  C"i  5,jl  (».»).  (5) 

Equation  (4)  then  takes  the  form 

■&!>  6  *  "X  £  +  F,  (6) 

with  solution 

(7) 

where  I  is  the  identity  matrix.  (It  should  be  noted  that  BD  is  not  necessarily 
symmetric,  even  if  B  is  symmetric.)  If  {5^ are  the  characteristic  numbers  of  matrix 
BD,  then 

(8) 

Therefore  (7)  possesses  a  solution  if  X  #  X*»  *  *  N. 

The  general  approximate  solution  to  the  integral  equation  of  the  second  kind 
is  afforded  by  (7)  and  is  considered  in  the  next  section.  Solution  to  the  integral 
equation  of  the  first  kind  (*«0  in  (1))  is  considered  in  the  succeeding  section. 

The  characteristic  value  problem  (fttoin  (1))  is  considered  last. 


4 


-480- 


TM  No. 
TC- 108-72 


INTEGRAL  EQUATIONS  OF  THE  SECOND  KIND 

The  Fredholm  Alternative  guarantees  that  if  the  kernel  K(x>y)  of  (1)  is 
square-i ntegrabl e ,  then  there  exists  a  unique  solution  to  (1)  if  X  is  not  a  charac¬ 
teristic  value.  Since  X  is  assumed  not  to  be  a  characteristic  value  in  this  section, 
the  existence  and  uniqueness  of  solutions  to  (1)  are  established  and  are  of  no 
concern  here.  This  is  not  the  case  for  Fredholm  integral  equations  of  the  first 
kind. 

The  numerical  solution  (7)  of  Fredholm  integral  equations  of  the  second  kind 
is  an  excellent  approximation  to  the  exact  solution,  if  the  kernel  Kfx,^  is  smooth 
enough  to  permit  a  good  approximation  to  the  integral,  as  in  (3),  by  some 
quadrature  formula.  However,  even  if  the  kernel  is  discontinuous,  the  approxima¬ 
tion  to  the  exact  solution  jty  can  still  be  quite  good.  These  rather  vague  remarks 
are  best  explained  by  examples. 

Example  1 


-C'  “*3  *  J<‘>  +  ahr'1’  (KX<  '•  <*> 

The  exact  solution  is  jW-X  .  When  a  four-point  Gaussian  quadrature  formula 
is  employed,  the  computed  solution  jff;)  at  the  points  is  correct  to  four 
significant  places,  as  shown  in  Table  1;  when  a  20-point  Gaussian  quadrature 
formula  is  used,  the  computed  solution  agrees  to  15  significant  places  with  the 
exact  solution. 

Table  1 

EXACT  AND  COMPUTED  SOLUTIONS  OF  (9A);  N  =  4 

i  to) _ 3ft) 

1  .0694318  .0694366 

2  .3300095  .3300128 

3  .6699905  .6699952 

4  .9305682  .9305794 


5 


-481- 


Example  2 


fdjj  K'(x,y)j(y)=  3^“l>  0<x<s,> 


(9B) 


where 


(This  example  can  also  be  interpreted  as  a  Volterra  integral  equation.)  The  exact 
solution  is  9W  “  C  .  Using  a  10-point  Gaussian  quadrature  formula,  despite  the 
discontinuous  kernel,  yields  the  reasonably  good  solution  of  Table  2. 


Table  2 


EXACT  AND  COMPUTED  SOLUTIONS  OF  (9B1;  N  =  10 


ft 

m 

&'> 

.01305 

1.034 

1.013 

.06747 

1.118 

1.070 

.1603 

1.256 

1.174 

,2833 

1.451 

1.328 

,4256 

1.702 

1.530 

,5744 

1.998 

1.776 

,7167 

2.308 

2.048 

,8397 

2.592 

2.316 

.9325 

2.802 

2.541 

,9869 

2.898 

2.683 

In  most  cases,  however,  the  exact  solution  is  unknown  and  it  might  be  thought 
that  one  way  to  determine  whether  or  not  the  numerical  solutions  are  accurate  is 
to  increase  the  order  of  the  quadrature  formula  and  solve  the  system  (7)  again.  For 
Fredholm  integral  equations  of  the  second  kind,  this  process  is  feasible  since  the 
system  (7)  is  well-conditioned,  even  for  high-order  quadrature  formulae.  The 
explanation  for  this  phenomenon  is  that  if  X  is  not  close  to  a  characteristic  value 
of  matrix  BD,  then  matrix  BD  -Xl  is  well-conditioned.  Matrix  BD  itself  can 
be  ill-conditioned  (as  seen  in  the  next  section). 


TM  No. 
TC- 108-72 


Once  the  approximate  solution  j(f-)  is  known  at  a  sufficient  number  of  points, 
any  form  of  interpolating  function  may  be  used  to  connect  these  points.  It  is 
worthwhile  to  note,  however,  that  a  ready-made  interpolation  procedure  is  already 
at  hand  once  (7)  has  been  solved.  Namely,  solve  for  9ft  on  the  right  hand  side 
of  (1  ^ and  approximate  the  integral  by  (3^  to  obtain  the  interpolated  approximation 

j« * -JHH  *<«^  00) 

which  automatically  goes  through  the  points  g(f{)  at  x*f;. 

INTEGRAL  EQUATIONS  OF  THE  FIRST  KIND 

The  Fredholm  integral  equation  of  the  first  kind  may  be  obtained  from  (1)  by 
putti  ng  X  s  0  : 

£  K(vj)3W  *£(*)>  fl<  **  k  (11) 

When  this  equation  is  viewed  as  an  operator  on  g  ,  it  is  readily  noticed  that 
discontinuous  functions  g  are  taken  into  continuous  functions  f.  More 
specifically,  define  C*  to  be  the  space  of  all  functions  J  such  that 

fdx  I3WI  exists  and  is  finite,  and  C"[*,Vl>t»2  *>  to  be  the  space  of  all  functions 
possessing  at  least  n  derivatives.  Then  equation  (11)  may  be  said  to  define  an 
operator  taking  functions  in  into  functions  in  where  n  is  an 

integer  such  that  K (*» 3)  possesses  the  n*h  partial  derivative  with  respect  of  x, 
since  we  have  by  a  well-known  theorem 

In  other  words,  the  more  partial  derivatives  the  kernel  K(x,j)  has  with  respect  to  x , 
the  "smoother"  the  function  U*)  in  (11)  must  be.  Stated  a  third  way,  since  cT*,bl 
is  a  subspace  of  C*U,k] ,any  attempted  inversion,  numerical  or  otherwise,  of 
equation  (11)  can  pose  considerable  problems.  In  each  particular  integral  equation 
of  the  first  kind,  such  fundamental  questions  as  existence  and  uniqueness  of  solu¬ 
tions  are  open  to  question. 


7 


-483- 


TM  No. 
TC-1 08-72 


If  one  attempts  a  straightforward  application  of  the  method  of  collocation 
by  putting  X»0  in  equation  (7),  then  one  discovers  that  the  matrix  BD  is  often 
badly  ill-conditioned,  even  for  low-order  quadrature  formulae.  Consider  the 
following  example. 

Example  3 

f  tljj  cos(m<g)  g(y)  =  ,  0  <  I.  (12) 

Here,  a  solution  is  given  by  I  for  0*x«l.  Gaussian  quadrature  of  orders  5, 
6,  and  7  all  give  6  or  7  correct  signfificant  digits  for  the  approximate  solution 
to  (12),  but  for  orders  8,  9,  10  and  higher,  the  accuracy  rapidly  deteriorates,  as 
shown  in  Table  3.  For  N2tl,the  results  get  progressively  worse,  because  the  matrix 
BD  becomes  more  ill-conditioned. 


Table  3 


COMPUTED  SOLUTIONS  OF  (12);  N  =  8,  9,  10. 


N 

fi 

=8  m 

01986 

.99917 

10167 

1.00053 

23723 

.99987 

40828 

T. 00003 

59172 

.99999 

76277 

1.00000 

89833 

1.00000 

98014 

1.00000 

N  =  9 


_S _ HS1_ 

01592 

1.25796 

08198 

.84076 

19331 

1.03748 

33787 

.99216 

,50000 

1.00185 

,66213 

.99946 

,80669 

1.00020 

91802 

.99991 

98408 

1.00004 

N  = 

ft 

10 

SR) 

01305 

5.70302 

,06747 

-1.84768 

,16029 

1.62953 

28330 

.88319 

42556 

1.02306 

57444 

.99467 

71669 

1.00148 

83970 

.99950 

93253 

1.00021 

98695 

.99991 

This  example  and  several  others  give  convincing  evidence  that  the  method  of 
collocation  is  not  a  particularly  good  method  to  use  directly  on  integral  equations 
of  the  first  kind,  because  of  the  difficulty  of  establishing  the  correctness 


8 


-484- 


TM  No. 
TC-108-72 


of  the  results,  even  if  o  unique  solution  is  known  to  exist.  No  recourse  is  possible 
to  high-.  der  quadrature  formulae  here,  as  in  the  case  of  integral  equations  of 
the  second  kind,  since  the  matrix  BO  becomes  progressively  more  ill-conditioned. 

Instead  of  putting  X*0  in  equation  (1)  to  obtain  the  integral  equation  of  the 
first  kind  immediately,  why  not  let  X  approach  0  in  steps,  thus  successively 
approximating  integral  equations  of  the  first  kind  with  integral  equations  of  the 
second  kind?  It  might  then  be  expected  that  the  solutions  to  the  integral  equations 
of  the  second  kind  approach  some  solution  of  the  integral  equation  of  the  first  kind, 
provided  of  course  that  the  integral  equation  of  the  first  kind  possesses  solutions 
at  all. 

More  specifically,  suppose  the  sequence  A,,  \x,  X,,  .  .  .  converges  to  0, 
and  that  none  of  the  X  are  characteristic  values  of  the  matrix  BD.  (It  has 
been  found  computationally  that  little  care  is  needed  in  choosing  this  sequence; 
however,  if  there  is  any  cause  for  concern,  it  is  certainly  possible  to  compute  the 
characteristic  values  of  the  matrix  BD  and  choose  the  to  be  midway 
between  ad|acent  characteristic  values.)  Let  0Kfr)  be  the  solution  of 

b  ^ 

9k(\j)  «  X,  3*(x)  +  f(x),  a <  x  <  b.  (13) 

It  is  believed  that  JJkIx)  converges  uniformly  to  some  bounded  solution  jW  of  (11 ) 
if  any  such  solutions  of  (1 1 )  exist.  Uhfortunately,  the  only  proof  of  this  statement 
presented  herein  is  computational  in  nature. 

The  next  five  examples  are  intended  to  be  a  good  sample  of  situations  that 
actually  arise  in  the  application  of  this  method. 

Example  4 

(*-  9<s)  --K-1)  °< x<  '• 

Since  the  integral  on  the  left  side  of  (14)  must  always  yield  an  expression  of  the 
form  «<x4^  for  any  function  gty),  there  does  not  exist  a  solution  to  this  equation. 
However,  the  equation 


9 


-485- 


TM  No. 
TC-1 08-72 


(x-  9*^  +  2*X  ”  *>  °<X<I,  05) 

certainly  has  a  unique  solution  0k(X)  for  each  non-characteristic  value  of  X*  , 
and  this  solution  must  obviously  be  a  quadratic  function  of  its  argument  x.  Table  4 
shows  the  results  of  solving  (1 5)  for  *  -  Jo"*,  •  -  »0-*,  and  X,  ■  0,  using  a 

6-point  Gaussian  quadrature  formula.  On  the  basis  of  the  information  in  Table  4, 
one  is  lei  to  the  Correct) conclusion  that  there  do  not  exist  any  bounded  solutions  of 
04). 


Table  4 

COMPUTED  SOLUTIONS  OF  (14);  N  =  6 


_JL_ 

9,  If,) 

j.(R) 

9,1?,) 

.033765 

. 1 69443E4* 

.169037E8 

-  .190339E19 

.169395 

. 1 65699E3 

.162407E7 

.396491  El 8 

.380690 

-.111633E4 

-  .Ill 843E8 

.858363E18 

.619309 

-.953767E3 

-  .954527E7 

.162675E19 

.830605 

.616148E3 

.616577E7 

-  .511557E19 

.966235 

.232968E4 

•233087E8 

.505322E19 

Example  5 

f  l 

J„  (*-  T^-)  3<J)  *  I-* -  ■'  »<  * ‘  I.  00 

Here,  we  must  obviously  have  An  example  is  9*1*/) *  fH|. 

Since  any  function  orthogonal  to  both  I  and  may  be  added  to  the  particular 

solution  14*/,  the  solution  to  (16)  is  not  unique.  Choosing  -  Itf*, 

and  «  0  ,  gives  the  results  of  Table  5,  when  Gaussian  quadrature  with  only  5 
points  is  used.  The  results  for  g,  and  are  excellent.  If  higher-order  Gaussian 


*  4 

E4  denotes  a  muHiplicative  factor  of  10  . 


10 


-486- 


TM  No. 
TC-1 08-72 

Table  5 


COMPUTED  SOLUTIONS  OF  (16);  N  =  5 


Si 

},fc) 

L»,) 

.04691008 

1.0469103 

1.0469101 

-41 .338 

.23076534 

1.2307654 

1 .2307653 

33.503 

.50000000 

1.4999999 

1.5000000 

1.000 

.76923466 

1.7691345 

1 .7692347 

-  4.000 

.95308992 

1.9530897 

1 .9530899 

-  8.000 

formulae  are  used,  even  better  results  may  be  obtained.  It  is  easily  seen  that  the 
solutions  are  converging  to  the  particular  solution  ly)  *  |4y(v*hex  ^„±q). 


Example  6 


J  (jvj  cos(Trxy)  3l«)  *  )  0  <  X  <  I. 

Here  there  exists  a  solution  with  a  simple  discontinuity: 

•  v  C  I  »  0  <  *  <  y  -  •  636”) 


J  [  0,  £<x<  ' 


v.  "  «.  -*  * 

Using  an  11-point  Gaussian  quadrature  formula  and  putting  X,c~K>  ,  X»*-lo  ,and 
Xj*  - I0"\  we  get  the  solutions  of  Table  6.  These  solutions  are  not  better  because 
the  integrand  is,  in  fact,  not  continuous  so  that  the  quadrature  is  not  particularly 
good. 


11 


-487- 


Table  6 


TM  No. 
TC-108-72 


COMPUTED  SOLUTIONS  OF  (17);  N  =  11 


* 

«,«■) 

K(*i) 

.01089 

.91588 

1.0159 

1.0091 

.05647 

.92752 

1.0095 

1.0087 

.13492 

.97852 

.9866 

1.0028 

.24045 

1 .06966 

.9786 

.9784 

.36523 

1.10212 

1.0427 

1.0045 

.50000 

.91608 

1.0075 

1 .0526 

.63477 

.49963 

.5399 

.5352 

.75955 

.08230 

-.0399 

-.0997 

.86508 

-.09775 

-.0649 

.0519 

.94353 

-.03132 

.1090 

-.0315 

.98911 

.10835 

-.0955 

.0182 

Example  7 

Cwfo’ *3)3(8)  *  C*Sfo);  0<*«  |.  08) 

Here  there  exists  at  least  one  solution  with  a  very  bad  kind  of  discontinuity, 
namely,  the  Dirac  delta  function,  $6*-  »/ir)  •  With  and 

and  with  12-point  Gaussian  quadrature,  a  solution  resembling  a 
continuous  approximation  of  3  W  does  indeed  show  up;  see  Table  7.  Considering 
the  numerical  difficulty  of  this  problem,  the  solutions  shown  in  Table  7  should 
be  considered  as  good  results.  Better  solutions  may  be  had  with  higher-order 
quadrature  formulae. 


12 


-488- 


TM  No. 
TC-1 08-72 


Table  7 


COMPUTED  SOLUTIONS  OF  (18);  N  =  12 


ft 

life) 

.911 E— 2 

.737 

-1 .376 

-1.399 

.479E-1 

.805 

-1.069 

-1.149 

.115 

1.120 

.297 

.059 

.206 

1.795 

2.871 

2.853 

.316 

2.509 

4.553 

5.213 

.437 

2.475 

2.737 

2.404 

.563 

1.274 

-  .568 

-1.267 

.684 

-  .258 

-  .748 

.582 

.794 

-  .779 

.798 

-  .224 

.885 

-  .178 

-  .004 

.057 

.952 

.408 

-  .829 

.014 

.991 

.397 

.995 

-  .024 

Example  8 

£  c*5(irx^  >&<*<!•  (19) 

This  equation  was  treated  in  Example  3  in  a  straightforward  manner,  with  poor 
results.  Here  we  put  -10°,  *,and  ,  with  a  10-point  Gaussian 

quadrature  formula.  Whereas  the  solution  before  with  N  =  10  was  very  poor, 
we  now  have  the  good  results  of  Table  8.  These  solutions  clearly  converge  to  the 
solution  g(x)  *  I  . 


13 


-489- 


Table  8 


TM  No. 
TC-1 08-72 


COMPUTED  SOLUTIONS  OF  (19);  N  =  10 


-  ft 

5,  v) 

Lftft 

9.  (ft) 

.01304  67357 

.99739  04370 

.99999  40301 

1 .00000  00067 

.06746  83167 

.99736  45162 

.99999  44020 

.99999  99954 

.16029  52159 

.99727  35342 

.99999  60232 

.99999  99962 

.28330  23029 

.99727  36762 

.99999  90304 

1 .00000  00072 

.42556  28305 

.99795  80272 

1.00000  12796 

.99999  99941 

.57443  71695 

.99981  58041 

1 .00000  05868 

1 .00000  00028 

.71669  76971 

1.00192  20787 

.99999  90874 

.99999  99998 

.83970  47841 

1.00204  72316 

.99999  99732 

9999999987 

.93253  16833 

.99923  61422 

1.00000  08581 

1.00000  00019 

.98695  32643 

.99565  76614 

.99999  90819 

.99999  99985 

Thus,  it  is  seen  that  the  proposed  method  is  a  very  good  one.  It  can  be  indicative 
of  the  nonexistence  of  exact  solutions,  or  of  the  existence  of  "spikes"  within  an 
exact  solution.  If  the  exact  solution  contains  simple  discontinuities,  the  computed 
solution  can  give  an  acceptable  indication  of  this  fact.  The  method  is  best,  of 
course,  when  the  solutions  are  all  continuous.  Its  most  interesting  feature  is  its 
ability  to  yield  a  particular  solution  even  when  the  solution  to  the  integral  equation 
is  not  unique. 

CHARACTERISTIC  VALUE  PROBLEMS 

When  -ffx)  is  zero  in  (1),  the  integral  equation  takes  the  form  of  (2).  The 
general  approximate  solution*  ft  in  (7)  then  will  be  identically  zero  unless 

A  A  A 

X  =  X,  or  X  *  A*  Of  •  Of  X*  X*  j  £0) 

where  are  the  characteristic  values  of  matrix  BD.  Each  characteristic  value 
X,  (n  *  n)  of  matrix  BD  is  an  approximation  to  the  first  N  characteristic 
values  fX^  of  kernel  K(x,  y) ,  (the  lower-order  ones  being  better  approximations). 
For  X  *  (6)  becomes  (since  F  -  0  here) 

_ <21> 

‘This  method  is  also  suggested  in  Ref.  1,  p.  447. 


14 


-490- 


TM  No. 
TC-1 08-72 


A 

Thus  is  proportional  to  the  normalized  characteristic  vector  of  matrix  BD, 
corresponding  to  characteristic  value  X*  .  Scaled  versions  of  6^  are  approxi¬ 
mations  to  the  characteristic  functions  of  kernel  .  That  is. 


'MS')*  lisN,  (22) 

for  I  N.  Thus  matrix  BD  contains  all  the  information  necessary  for 
simultaneous  approximate  determination  of  N  characteristic  values  and  functions 
of  kernel 


The  method  above  yields  samples  of  the  approximate  solution  3*(x')  only  at 
the  points  X-fi,|£i£N  .  Evaluation  of  Jjlx)  could  be  accomplished  by  polynomial 
interpolation,  for  example,  or  by  employing  (3)  in  (1)  to  obtain 


(23) 


The  approximation  obtained  from  the  right  side  of  (23)  automatically  goes 
through  the  values  sN;see  (4). 


In  order  to  obtain  an  approximation  to  the  characteristic  function  (2), 
we  let  the  approximation  be 

A 

and  choose  C*  such  that  has  unit  energy.  That  is. 


(24) 


l  =  ^ $**  (*)  e  c"  3"  ^  ‘^■Wj  9"  (25) 

J 

Solving  for  Cm  and  substituting  in  (24)  and  (23),  there  follows 

ito  s  ilnx(»,Tj)  j.(ti) 

“sn  a  rw 

The  right  side  of  (26)  is  an  interpolated  approximation  to  the  nth  characteristic 
function  of  kernel  K  (*>;)). 


(26) 


15 


-491- 


TM  No. 

TC-1 08-72 

Even  and  Odd  Solutions 

Before  embarking  on  examples/  it  should  be  pointed  out  that  advantage  should 
be  taken  of  any  symmetry  properties  of  the  kernel  Suppose  the  integral 

equation  is  of  the  form 

[A  =  X  *  a  <  X  -r  a  •  (27) 

Now  suppose  that  the  kernel  satisfies 


k'Ua')*  k(m>.  (») 

Then 

Cl  01 

Xg(-x) * Iad3 KMgfo) *  x$ty.  (29) 

Therefore  3(1$  must  be  even.  Then  letting  t  =  -y,  and  using  the  evenness  of 

3W>  r°  fa 

L  h  K  M  3^  =  \  &  Kfo-t}  3  (b) .  (30) 

Therefore  (27)  becomes 

[k(m}  +  K(vy>) 3  to) *  Xgfo , 0 < x < a,  (3i ) 

and  the  interval  has  been  cut  in  half.  This  is  very  advantageous  for  numerical 
computation,  since  either  fewer  computations  need  be  carried  out,  or  increased 
accuracy  for  the  same  number  of  computations  can  be  realized. 

For  example,  for  the  kernel 


K(^'  *  +  A.Bci»(»])  ’  (32) 

(28)  is  satisfied,  leading  to  even  3W  .  However,  for  the  kernel 

Kl*,s)m  T+Bw(.-a)  ’  «) 

(28)  is  not  satisfied,  and  qjfy  need  not  be  even. 

16 


-492- 


Conversely  if 


TM  No. 
TC-1 08-72 


K (-*>*)  *  -KKs),  (34) 

then  g(x)  must  be  odd;  again,  the  interval  can  be  cut  in  half  in  a  manner  similar 
to  (31). 

Example  9 


(35) 


Since 


l+)»*-2h  cosfrrfx-yj]  }+ ha-2b*0S £»■(*•♦  l»3 
KK^)  *  +■  2  k  &  ££»(<nrx)t^'ftw(»rr^  ?  |M  <  *>  (36) 


we  have 


|-h*  ^  »  112  °> 

<t>o(*)=li  4>Jx)  *  ft C05(^x),  *>  >  I. 


(37) 

The  first  integration  rule  adopted  in  (3)  is  the  Trapezoidal  rule.  For  h*0.5,  the 
approximations  to  the  characteristic  values  for  and  20  are  listed  in  Table  9. 


Table  9 

CHARACTERISTIC  VALUES  VIA  TRAPEZOIDAL  RULE;  h  =  0.5 


N=1 

N-2o 

Ex*c* 

2.666675 

2.666666 

2.66666 7 

1 .333435 

1 .333333 

1 .333333 

0.666839 

0.666666 

0.666667 

0.333664 

0.333333 

0.333333 

0.167320 

0.166667 

0.166667 

0.084637 

0.083333 

0.083333 

0.044271 

0.041667 

0.041667 

0.026042 

0.020833 

0.020833 

0.020834 

0.010417 

0.010417 

-493- 


TM  No. 
TC-1 08^72 


The  approximations  op  through  for  N"20  are  seen  to  be  identical  to  the 
exact  values.  Those  for  are  less  accurate. 

All  the  approximations  to  the  characteristic  functions,  for  N*  1,  were  found 
to  be  accurate  to  at  least  six  decimal  places  at  the  points  X*9i  >  even  for  i^fe). 
Thus,  far  better  accuracy  was  obtained  in  estimating  the  characteristic  functions 
than  in  estimating  the  characteristic  values,  for  the  kernel  of  (35). 

When  the  interpolated  approximation  (26)  was  compared  with  the  exact 
characteristic  function  (37),  the  worst  ratio  of  answers  was  0.9982.  Thus,  the 
interpolated  function  is  not  as  accurate  as  the  sample  values  at 

When  h  was  increased  to  0.9,  the  approximate  characteristic  numbers  were 
less  accurate  for  N  =  20,  as  depicted  in  Table  10.  This  is  due  to  the  fact  that  the 
kernel  of  (35)  now  has  a  large  maximum-to-minimum  ratio,  and  the  approximation 

Table  10 

CHARACTERISTIC  VALUES  VIA  TRAPEZOIDAL  RULE;  h  =  0.9 


N  =  20 

Exact 

10.5i6 

9.867 

9.474 

8.926 

8.526 

8.085 

7.674 

of  (3)  is  not  adequate.  A  larger  value  of  N  is  needed  in  this  case.  (In  practice, 
two  different  values  of  N  would  be  tried,  and  increased  until  substantially  the 
same  results  obtained.)  However,  it  was  again  found  that  good  approximations  to 
the  characteristic  functions  were  obtained.  In  fact,  at  least  four  decimals  were 
retained  up  through  at  the  points  {?■}  .  However,  the  interpolated 

approximation  was  far  less  accurate,  being  about  10%  in  error. 


18 


-494- 


TM  No. 
TC- 108-72 


When  the  integration  rule  was  that  of  Gauss  quadrature,  the  results  of 
Table  11  were  obtained,  for  h  =  0.5.  These  results  are  not  as  good  as  those 

Table  11 


CHARACTERISTIC  VALUES  VIA  GAUSS  QUADRATURE;  h  =  0.5 


N  =  9 
2.667186 
1 .333653 
0.667836 
0.335439 
0.172508 


Exact 
2.66666? 
1 .333333 
0.666667 
0.333333 
0.166667 


obtained  via  the  Trapezoidal  rule.  It  is  believed  that  the  reason  for  this  behavior 
is  due  to  the  fact  that  kernel  is  even  about  y  =  0  and  y  =  1 ,  for  any  x. 

And  since  is  also  even  about  y  =  0  and  y  =  1 ,  the  Trapezoidal  rule  with 

end  correction  (Ref.  2)  reduces  to  the  simple  case  adopted  here.  Then  since  we 
are  integrating  over  a  period  of  a  periodic  function,  very  accurate  results  are 
possible  via  the  Trapezoidal  rule  (see  Ref.  2,  sect.  2.9).  For  more  general  kernels, 
this  behavior  need  not  be  expected.  (However,  another  reason  for  choosing  the 
Trapezoidal  rule  will  be  presented  later  in  Example  11 .)  The  accuracy  in  the 
characteristic  functions  obtained  via  Gauss  quadrature  was  of  comparable  accuracy 
to  that  obtained  for  the  characteristic  values. 

Example  10 


K(*,5})  " 


14-Vi*  -  2h  i»s{y  (*'*)] 


,  -  I  <  *,  j  <  I 


(38) 


Since 


K  li" ^COslynr x) c*s(vr^ -t- *m(nrr^ i^nr^J  ^  (39) 


19 


-495- 


we  hove 


TM  No. 
TC-1 08-72 


=  {fc  ’  Cos(tt  *),  sintir*),  C0j(2T^,  S  infarx)  f .  •  .  (40) 

Thus,  double  roots  occur  for  the  characteristic  values  for  this  kernel. 

For  the  Trapezoidal  rule  of  integration,  and  for  h  =  0.5,  the  results  of  the 
approximate  technique  are  given  in  Table  12.  There  is  no  problem  in  evaluating 
the  double  roots,  nor  in  obtaining  two  linearly  independent  characteristic  vectors 
of  the  matrix  BD.  Again,  very  good  accuracy  in  characteristic  function  approxi¬ 
mation  at  the  points  tfj\wa$  obtained. 

Table  12 

CHARACTERISTIC  VALUES  VIA  TRAPEZOIDAL  RULE;  h  =  0.5 


Example  11 


N  =  20 

Exact 

2.666676 

2.666667 

1 .333346 

1 .333333 

1 .333335 

1 .333333 

0.666688 

0.666667 

0.666688 

0.666667 

0.333375 

0.333333 

0.333374 

0.333333 

K  (*,«})  =  exp(-|x-y|' 

I,  o«  »,y 

(41) 


The  exact  characteristic  values  and  functions  are  given  in  Ref.  3,  p.  116.  The 
results  of  applying  the  Trapezoidal  rule  are  given  in  Table  13  for  N  =  9,  20, 
and  50.  The  relatively  poorer  accuracy  obtained  in  this  example  is  due  to  the 
cusp  i  n  the  kernel  (41 )  at  x  =  y . 


Attempts  to  apply  either  Simpson's  rule  or  Gauss  quadrature  to  this  example 
lead  to  poorer  results  than  use  of  the  Trapezoidal  rule.  The  reason  for  this 
is  that,  for  maximum  accuracy,  the  integral  in  (2)  should  be  broken 
into  the  ranges  (0,  x)  and  (x,  1),  and  the  integrand  approximated  in  each 


20 


-496- 


Table  13 


TM  No. 
TC- 108-72 


CHARACTERISTIC  VALUES  VIA  TRAPEZOIDAL  RULE 


20 

Exe.t 

0.738619 

0.738777 

0.738806 

0.738811 

0.139433 

0.138257 

0.138042 

0.138004 

0.047308 

0.045474 

0.045146 

0.045088 

0.023878 

0.021757 

0.021393 

0.A2I329 

0.0IS07* 

0.OI2-T2I 

0.012345 

O.OI2279 

region.  However,  for  Gauss  quadrature,  since  x  must  be  selected  as  the  points 
fai}>  the  integrals  over  and  (fi, l)  do  not  themselves  yield  to  Gauss  quadrature 

without  additional  different  sampling  plans  within  these  intervals.  For  equi-spaced 
choices  of  x  (as  for  the  Simpson  and  Trapezoidal  rules),  the  parabolic  approxi¬ 
mation  of  Simpson's  rule  cannot  be  applied  at  certain  points.  For  example,  when 
Xs?»  =  JPY  •  there  is  only  one  panel  in  the  interval  (0,x).  Also  as  x  is  changed 
from  tj  to  ,  the  number  of  panels  changes  from  even  to  odd  (or  vice  versa)  in  the 
intervals  (0,  x)  and  (x,1).  Thus  a  composite  Simpson-Trapezoidal  rule  would 
have  to  be  adopted.  However,  the  simple  Trapezoidal  rule  suffers  no  such  problem^ 
giving  a  linear  approximation  to  the  integrand  for  all  X*  f;  .  Furthermore,  the 
two  weight  factors  ^  applied  to  the  point  x  in  the  two  intervals  (0,x)  and  (x,  1) 
combine  to  give  the  standard  Trapezoidal  rule  applied  to  the  entire  interval  (0,  1). 
Thus  the  method  described  earlier  for  forming  matrix  D  of  Trapezoidal  weights  {w,} 
applies  directly,  for  this  kernel  where  the  cusp  lies  at  y  =  x.  A  similar  conclusion 
would  hold  for  a  discontinuous  kernel  along  the  line  y  =  x,  provided  the  average 
value  of  the  kernel  were  used  in  matrix  B  for  x  =  y. 

It  is  possible  to  apply  Simpson's  rule  or  Gauss  quadrature  to  this  particular 
kernel,  and  ignore  the  cusp.  However,  since  the  integrand  of  (2)  is  not  being 
well  approximated,  poorer  results  can  be  anticipated.  The  results  of  Table  14 
corroborate  this.  Slightly  poorer  characteristic  values  were  obtained  via 
Simpson's  rule  than  Gauss  integration;  however,  the  reverse  was  true  for  the 
characteristic  function  approximations.  The  Trapezoidal  rule  outperformed  both 
in  all  aspects,  however. 


21 


-497- 


Table  14 


TM  No. 
TC-1 08-72 


CHARACTERISTIC  VALUES  FOR  N  =  9 


Sltnpson 

£■«.»*  s 

Ex«c-k 

0.742329 

0.742030 

0.738619 

0.738811 

0.141696 

0.140302 

0.139433 

0.138004 

0.049161 

0.048005 

0.047308 

0.045088 

0.026400 

0.024658 

0.023878 

0.021329 

0.010096 

0.016482 

0.015078 

0.012279 

SUMMARY 

The  method  of  collocation  for  numerical  solution  of  Fredholm  integral 
equations  has  been  investigated  extensively  via  examples,  and  found  to  be 
workable  and  accurate  when  expropriate  care  is  taken.  For  example,  the  selection 
of  the  integration  rule  is  important,  and  the  way  non-existent  solutions  or 
discontinuous  solutions  manifest  themselves  in  the  numerical  approaches  considered 
must  be  known  and  anticipated. 

The  method  of  solution  of  integral  equations  of  the  first  kind,  as  limits  of 
integral  equations  of  the  second  kind,  is  found  to  be  a  very  worthwhile  method 
in  the  case  where  continuous  solutions  exist.  When  a  discontinuous  solution 
exists,  and  the  discontinuous  portion  $tU)  can  be  estimated,  the  integral 
equation  (ll)  can  be  put  in  the  form  (by  letting  J.W  +  jsW) 

£  KM 9,(1)  -  j^y  K(x.y) $4(y),  a<  x <  b.  (42) 

The  HgWt'-hanA  sUe  of  (42)  is  known  (when  is  known  or  estimated);  thus 

(42)  is  an  integral  equation  for  ^1*),  to  which  the  numerical  approach  can  be 
applied  with  more  numerical  accuracy.  A  recursive  procedure,  whereby  J|(x) 
is  estimated  more  and  more  accurately,  is  especially  worthy  of  consideration. 

The  FORTRAN  program  used  for  the  above  computations  is  listed  in  the 
Appendix.  The  six  possible  input  variables  are  read  in  via  NAMELIST  and  are 
defined  within  the  comment  section  of  the  main  program.  Three  external 

22 


-498- 


TM  No. 
TC-1 08-72 


subroutines,  D  Q  U  A  D,  F,  and  T  K,  are  used  to  give  the  quadrature  formula 
weights  and  abscissas,  to  define  the  function  -f  U),  and  to  define  the  kernel  Kt*,^ 
respectively.  All  computations  are  double-precision, except  the  calculation  of 
characteristic  values  and  vectors  which  is  single-precision  (because  the  only 
routine  now  available  for  non-symmetric  matrices,  EIGENP,  is  single  precision). 

REFERENCES 

1.  F.  B.  Hildebrand,  Methods  of  Applied  Mathematics,  Prentice-Hall  Inc., 

N.Y.,  Second  Printing,  1954.  -  — — 

2.  P.  J.  Davis  and  P.  Rabinowitz,  Numerical  Integration,  Blaisdell  Publ.  Co., 
Waltham,  Mass.,  1967. 

3.  C.  W.  Helstrom,  Statistical  Theory  of  Signal  Detection,  Pergamon  Ptess, 
N.Y.  1960. 


23 


-499- 


TM  No.  TC-1 08-72 


APPENDIX 


PARAMETER  NR=i00#NC=NHM 
PARAMETER  M=50 

DOUBLE  PRECISION  A(NR»NC),V(2)»Y(NR)»W(NR)»F»TK 
OOUBlE  PRECISION  alpha,beta,dalamb 

DIMENSION  JC(NC)»B(M,M),EVR(M) »EVI (M) , VECR(M,M) » VECI (M»M) , INDlCfM) 
LOGICAL  EIGEN 

NAMELIST/UATA/EIGEn»UmLAMB»ALPHA#BETA,LIMIT1,LIMIT2, LIMITS 

LIMIT1  =  s 

L1MIT2  s  20 

LIMITS  =  1 

EIGEN  s  .false. 

****************************************************************** 


IF  EIGEN  =  .TRUE.  ,  THEN  the  CHARACTERISTIC  value  problem 
IS  treated . 

DALAMD  *  G(X)  =  INTEGRAL  (  TK(X» Y)*G( T)*DY  ) 

BOTH  CHAKACTERISTIC  VALUES  ANy  CHARACTERIC  VECTORS  ARE  FOUND. 

PRxNl  STATEMENTS  MyST  BE  INSERTED  FOR  THIS  CASE. 

*************************************************************g**** 

IF  EIGEN  =  .false.  ,  the  the  program  COMPUTES  the  solution  of 
FREDHOLM  INTEGRAL  EQUATIONS  OF  EITHER  THE  FIRST  OR  SECONO  KINDS. 

AN i  INTERVAL  OF  INTEREST  IS  ASSUMED  To  BE  FROM  ALPHA  TO  BETA. 

F(X)  IS  THE  FUNCTION  WHOSE  INVERSE  IS  SOUGHT. 

TKIX.Y)  IS  THE  KERNAL.  DAeAMB  IS  THE  SCALE  FACTOR. 

IF  DALAMD  =  0,00.  THE  EQUATION  IS  OF  THE  FIRST  KIND... 

F(X)  =  INTEGRAL  (  TK(X,Y)*G(Y)*DY  ) 

IF  DALAMB  NE  U.UO#  The  EQUATION  IS  OF  THE  SECOND  KIND... 

F (X)  +  UALAMU  *  G(X)  =  INTEGRAL  (  TK (X, Y ) *6( T ) *OY  ) 

***************************** I************************************ 

IN  All  CASES,  any  symmetry  which  would  result  in  redunoant 
EQUATIONS  IS  ASSUMED  TO  BE  REMOVED. IF  THIS  IS  NOT  DONE*  AN 
ERROR  RETURN  FROM  DGJH  SHOULD  OCCUR. 

****************************************************************** 


24 


-500- 


TM  No. 
TC-1 08-72 


THREE  EXTERNAL  FUNCTION  SUBROUTINES  ARE  NECESSARY... 

OUOAD ( Y » W » N » ALPHA  , bET  A , $ )  »  WHERE,., 

y  is  the  abscissa  array  for  some  quadrature  formula 

w  is  the  weight  array  for  some  quadrature  formula 

N  IS  THE  ORDER  OF  THE  QUADaTURE  FORMULA 
TMX.Y)  f  WHERE... 

y  is  thi.  variable  of  integration 

X  is  THE  REMAINING  ARGUMENf 

F (a)  ,  where...  X  aS  AS  IN  TK(X,Y> 

****************************************************************** 

SEVEN  VARIABLES  MAY  bE  READ  I»m  NAMELIST.  AT  THE  END  OF  EACH 
CasE,  THE  PROGRAM  ATTEMPTS  TO  READ  NEW  DATA, 

****************************************************************** 

A  READ  (3,jATA) 

WKaTE(4»uATA) 

DO  10  NS1.1MIT1.LIMIT2.LIMIT3 
C alL  DO J AD  l  Y » W » N ,  AlPHA  » BE  T A ,  $250 ) 

MC  =  N  +  1 
00  101  1=1, N 
DO  101  J=lrN 

AUtJ)  =TK(Y(I)»Y(U))*W{Ji 
101  COHTiNUE 

IF (EIGEN)  GO  TO  125 
00  99  1=1, N 

99  AUrl)  =  A(I»i)  -  DAlaMB 
DO  100  1=1, N 
100  A(i»MC)  =  F ( Y ( I ) ) 

VU>  =  4. 

CAO.  DGJH  ( A, Nt, NR, N,MC, $250,  JCV) 

PKiNf  43, N, DAlAMB 

PKiNT  41  ,  (  Y (I) , A(1,MC) ,I=1,N) 

GO  TO  10 

2a0  PRANT  42 


25 


TM  No. 
TC-t  08-72 


PRaNT  41.(Y(I).*(I).I=1,N) 

PRINT  41 .  V(1)»V<2> 

PRINT  47  »  < (A<I»U) »J=1»MC) »I=1»N) 
10  CONTINUE 
PRINT  44 

IRNR.NE.O)  GO  TO  1 
Ut  CONTINUE 
DO  30  1=1, M 
UO  30  jrl.M 
30  BU.J)  =  A  ( I ,  J) 


C^U-L  ElGENPCM.M.U.27,  .EVR.EVI » VECR.  vECI  .  INDIC) 


EloENP  FINDS  CHAR.  VALUES  AND  VECTORS  FOR  A  REAL  NON-SYMMETRlC  M> 
EVK  AND  EVI  CONTAIN  THE  REAL  mND  IMAGINARY  PARTS  OF  CHAR.  VALUES 
VLV.K  ANO  VECI  CONTAIN  THE  REAL.  AND  IMAG.  PARTS  OF  THE  CHAR  VECTt 
REFERENCE...  USL  TtCH  MEMO  2070-163  .  12  MAY  1969. T  E  CTl 


AT  THIS  POINT.  APPROPRIATE  PRINT  STATEMENTS  SHOULD  BE  INSERTED. 

***+**************+********w****+*m+mmmmmtmm*mmt 

IF(NR.NE.O)  go  to  1 

41  FORMAT </2U20. 18) 

42  FORMAT ('  ERROR  RETURN  ») 

43  FORMAT 11H1.//.13X. »AbSClSSA»  »*0X» * ORDINATE* .20X. IS. •  POINTS* .10X. 
1  *  UAL AMD  =  * .016.12.// ) 

44  FORMAT (lhl) 

47  FORMAT (/6U21. 14) 

111  STOP 
END 


26 


-502- 


Solution  Of  Large  Hermitian  Eigenproblems 
On  Virtual  And  Cache  Memory  Computers 


R.  L.  Streit 


-503- 


TECHNICAL  NOTES 


Solution  of  Large  Hermitian  Eigenproblems 
on  Virtual  and  Cache  Memory  Computers 


Roy  L.  St re it 

Naval  Underwater  Systems  Center 
New  London  Laboratory 
New  London,  CT  06320 


The  impact  of  cache  and  virtual  memories  on  the 
performance  of  state-of-the-art  software  has  not 
yet  been  fully  explored.  This  letter  documents 
the  truth  of  this  statement  by  presenting  the 
results  of  a  computational  experiment  performed  on 
the  VAX  11/780  (a  computer  possessing  both  these 
features)  with  well  established  and  documented 
FORTRAN  subroutines  designed  for  high  accuracy 
numerical  solution  of  the  ordinary  Hermitian 
eigenproblem  Az  ■  X  z.  Easily  implemented  changes 
in  the  storage  allocation  of  the  complex  matrix  A 
resulted  in  roughly  half  an  order  of  magnitude 
improvement  in  both  elapsed  CPU  and  wall  clock 
time,  as  well  as  in  significantly  improved  overall 
system  throughput.  These  changes  do  not  affect 
the  numerical  algorithm  in  any  way. 

Hie  particular  programs  discussed  here  were 
written  by  International  Mathematical  and 
Statistical  Libraries,  Inc.,  Ill  hereinafter 
referred  to  as  IMSL.  The  storage  allocation 
scheme  for  the  complex  matrix  A  used  in  the  IMSL 
routines  is  similar  to  that  used  earlier  in  the 
EISPACK  routines  developed  by  Argonne  National 
Laboratories  [2] .  The  EISPACK  routines  were  not 
resident  on  the  available  computer,  so  only  the 
IMSL  routines  were  used  here.  However,  it  is 
highly  probable  that  the  observations  recorded 
below  would  be  substantially  unchanged  had  the 
EISPACK  routines  been  used  Instead  of  the  IMSL 
routines. 

The  most  natural  way  to  store  the  complex 
Hermitian  matrix  A  la  simply  to  declare  A  to  be 
COMPLEX.  Thus,  If  a^j  is  a  particular  entry  in 
the  matrix  A,  then  the  address  of  the  real  part  of 
a*,  and  the  address  of  the  Imaginary  part  of  ajj 
differ  numerically  by  precisely  1;  that  Is,  the 
real  and  Imaginary  parts  of  ajj  are  adjacent  to 
each  other  In  computer  memory. 

Hie  approach  taken  by  both  EISPACK  and  IMSL  is  to 
separate  the  matrix  A  into  two  real  matrices,  AR 
and  AI,  which  contain  all  the  real  and  imaginary 
parts,  respectively,  of  A.  Thus,  we  have  the 
matrix  equation  A  ■  AR  ♦  1  AI  with  AR  and  AI 
typed  REAL.  The  primary  drawback  to  this  method 
is  that  the  address  of  the  real  part  of  a^j  and 
the  address  of  the  Imaginary  part  of  aji  differ 
numerically  by  at  least  r? ,  where  n  Is  the  order 
of  the  eigenproblem;  that  Is  the  real  and 
Imaginary  parts  of  ajj  are  widely  separated  from 
each  other  in  the  computer  memory  for  large  order 
eigenproblems. 

For  large  order  eigenproblems ,  this  storage  method 
can  cause  one  cache  miss  for  each  arithmetic 
operation  performed,  as  well  as  a  very  significant 
number  of  page  faults  to  disc,  thus  considerably 


Increasing  both  CPU  and  wall  clock  time  on  a 
computer  such  as  the  VAX.  These  concerns  were 
discussed  with  Thomas  Aird  of  IMSL,  who  responded 
by  suggesting  [3]  several  easily  effected 
modifications  to  the  appropriate  IMSL  routines 
(namely,  EIGCH,  EHOUSH,  and  EHBCKH) ,  which  force 
the  real  and  imaginary  parts  of  every  entry  a^j  of 
the  matrix  A  to  be  stored  in  adjacent  memory 
locations  throughout  the  computations.  These  were 
implemented  and  found  to  work  as  anticipated.* 

The  solution  of  an  order  n  =  210  eigenproblem  was 
computed  in  an  applications  program  l U]  using  both 
the  standard  IMSL  code  and  the  modified  IMSL  code. 
Other  than  in  the  IMSL  routines,  the  actual  code 
executed  was  identical.  Both  executions  were 
performed  in  a  dedicated  environment;  i.e.,  no 
other  users  were  permitted  during  program 
execution.  Thus  the  program  overhead  (setting  of 
the  matrices,  I/O  operations,  etc.)  and  the  system 
overhead  was  identical  in  both  cases.  Table  1 
shows  that  the  modified  routines  significantly 
outperformed  the  standard  routines. 

IMSL  MODIFIED 

IMSL 


Elapsed  CPU  time 

56m  3Us 

25m 

39s 

Elapsed  vail  clock  time 

6lw  573 

27m 

17  s 

Ifcge  fault  count 

15,  27U 

260 

Direct  I/O  count 

1,  112 

1. 

111 

Buffered  I/O  count 

157 

157 

Peak  virtual  size  (pages) 

2>  1*65 

2, 

U65 

Working  set  size  (pages) 

1,  02U 

h 

02>4 

Mounted  volumes 

0 

0 

Table  Is  Eigenproblem  Order  nw210 

It  is  also  clear  from  Table  1  that  the  page  fault 
count  is  not  significant  in  this  particular 
problem.  Improved  performance  is  due  primarily  to 
Improved  use  of  cache  memory.  The  VAX  transfers 
two  contiguous  words  (6U  bits)  on  a  cache  miss. 
Thus,  if  a  complex  number,  say  z,  is  needed  for  an 
arithmetic  operation  and  if  z  is  not  in  cache, 
then  z  will  be  found  with  only  one  cache  miss  if 

*Ed.  Note:  I  have  been  informed  by  IMSL  that  the 
next  version  of  their  library  in  which  the  arrays 
are  stored  in  COMPLEX  mode,  will  eliminate  this 
problem. 


6 


-505- 


the  real  and  imaginary  parts  are  stored  in 
adjacent  memory  locations.  In  any  other  storage 
mode,  moving  z  into  cache  memory  would  cost  two 
cache  misses.  Thus,  it  is  fair  to  say  that  every 
arithmetic  operation  resulted  in  a  cache  miss.  If 
there  were  50  n^  arithmetic  operations  and  a 
cache  miss  costs  1.8  us,  then  this  assumption 
accounts  for  17  minutes  of  CPU  time  savings.  The 
remainder  of  the  observed  improvement  has  not  been 
accounted  for. 

The  applications  program  was  executed  again,  but 
this  time  the  solution  of  an  eigenproblem  of  order 
n  =  420  was  required.  This  eigenproblem  was 
solved  using  both  the  standard  and  modified  IMSL 
routines  as  before.  A  significant  difference  is 
that  other  users  were  freely  permitted  on  the 
system  throughout  both  of  the  program  executions 
whose  results  are  presented  in  Table  2.  In  this 
case,  page  faults  may  well  have  had  the  more 
significant  impact  on  program  performance. 

IMSL  MODIFIED 

IMSL 


Elapsed  CPU  time 

12h  15m 

48s 

3h 

51m 

52s 

Elapsed  wall  clock  time 

20Vi  53m 

09s 

13H 

l4n 

55* 

Page  fault  count 

9,  453, 

587 

997, 

395 

Direct  1/0  count 

3, 

760 

3, 

860 

Buffered  1/0  count 

26L 

264 

Peak  virtual  size  (pages) 

707 

8. 

707 

Working  set  size  (pages) 

024 

h 

02L 

Mounted  volumes 

0 

0 

Table  2:  Eigenproblem  Order  n»L20 

In  a  multiuser  environment,  page  faults  to  disk 
tie  up  a  disk  controller  for  several  tens  of 
milliseconds  and  the  large  eigenproblems  described 
in  this  letter  significantly  reduced  overall 
system  throughput.  In  the  broad  view  this  is 
probably  a  more  serious  problem  than  individual 
program  CPU  times. 

The  contents  of  this  letter  were  first  documented 
in  1 5 1 . 


References 

1.  IMSL  Library,  International  Mathematical  and 
Statistical  Libraries,  Inc.,  7500  Bellaire 
Boulevard,  Houston,  Texas. 

2.  B.  T.  anith,  et  al.  Matrix  Elgensystcm 
Routines  -  EISPACK  Guide ,  Lecture  notes  in 
Computer  Science,  Vol.  6,  Second  Edition, 
Springen-Verlag,  1976. 

3.  Thomas  Aird,  private  communication,  14 
November  1979. 

It.  R.  L.  Streit,  "Array  Optimization  Using 

Subarrays ,”  NUSC  Technical  Report  5889,  Naval 
Underwater  Systems  Center,  New  London 
laboratory.  New  London,  Connecticut,  23  March 
1979. 

5.  R.  L.  Streit  and  B.  G.  Buehler,  "The  Efficient 
Solution  of  Large  Hermitlan  Eigenproblems  on 
Virtual  and  Cache  Memory  Computers ,"  NUSC 
Technical  Memorandum  No.  801019,  Naval 
Underwater  Systems  Center,  New  London, 
Connecticut,  29  January  1980. 


7 


-506- 


*us  GOVERNMENT  PRINTING  OFFICE  1932  600- 73? 


